This is an expanded and revised version of v. 1. It relies on
terminology and concepts developed in the recently published Conceptual Modeling and
Database Design: A Foundation Framework for Data Management (referred
to as paper #2 for short), which is strongly recommended as a preamble.
ABSTRACT
There
are tons of material on normalization, mostly against it and outright wrong,
but despite its having been thoroughly debunked, confusion and poor
understanding still reign in the industry and, sadly, in academia too. Some of
the material is either purely formal and inaccessible to the average
practitioner, or in what we call “cookbook” form—“recipes” to be followed
mechanically, without understanding why. Neither approach relates to the other,
and both contribute to the poor understanding of normalization, its practical
purpose and advantages, and particularly the severe costs of less than full
normalization, which are almost completely ignored.
Instead
of focusing on the real solution to performance maximization—better
implementations of truly relational DBMSs (TRDBMS)—the industry distracts
practitioners who are, like vendors and the trade media, mired in the
logical-physical
confusion, by encouraging or compelling them to “denormalize for
performance”, to the costly consequences of which all are oblivious.
Furthermore,
as we show in
Conceptual Modeling and Database Design: A
Foundation Framework for Data Management, given proper
conceptual modeling, accurate logical design yields databases that adhere to
the Principle of Full Normalization, and the Principle of Orthogonal
Design, benefiting from the practical advantages deriving from both. In
fact, a normalization process is required if and only if database design s
flawed—it is, in other words, a design repair endeavor.
The
focus of this paper is, by contrast, on understanding normalization, its
practical benefits, and the heavy costs of violating it. It
Ø
explains
normal forms
Ø
documents
all costs of less than full normalization
Ø
exposes
“denormalization for performance” as an illusion, and explains why it is
sometimes imposed by DBMS products
Ø
debunks
some prevalent misconceptions, to demonstrate how proper knowledge of data
fundamentals is necessary to see through industry poor practices and erroneous
pronouncements, and to avoid costly mistakes.
TABLE OF CONTENTS
·
INTRODUCTION
·
UNDERSTANDING NORMALIZATION
·
THE FULL PRICE OF LESS THAN FULL NORMALIZATION
·
FURTHER NORMALIZATION
·
“THE WHOLE
KEY”: SECOND NORMAL FORM
·
“NOTHING BUT THE KEY”: THIRD NORMAL FORM
·
GENERALIZED DEPENDENCIES
·
MULTIVALUED DEPENDENCIES: FOURTH NORMAL FORM
·
JOIN DEPENDENCIES: FIFTH NORMAL FORM
·
TWO DATABASE DESIGN PRINCIPLES
·
“DENORMALIZATION FOR PERFORMANCE”
·
THE LOGICAL-PHYSICAL CONFUSION
·
DATABASE BIAS
·
REDUNDANCY CONTROL
·
THE REAL PROBLEM AND SOLUTION
·
SOME FALLACIES DEBUNKED
·
DENORMALIZATION “GUIDELINES”
·
WHERE’S THE (CONCEPTUAL) BEEF?
·
ABNORMAL VIEW OF NORMAL FORMS
·
“USER-CENTERED” DESIGN
·
“OVERNORMALIZATION”
·
SCHEMA HARDLY A STAR
·
CONCLUSION AND RECOMMENDATIONS
·
APPENDIX A: BOYCE-CODD NORMAL FORM
·
REFERENCES
Use of Materials Policy
PRICING AND ORDERING