This is an expanded and revised version of this paper and
supersedes all previous versions. It relies on
terminology and concepts developed in paper #2 Business Modeling for
Database Design: A Foundation Framework for Data Management which is
strongly recommended as a preamble.
ABSTRACT
There is a plethora of material on normalization, mostly against it and outright wrong, but despite its having been thoroughly debunked, confusion and poor understanding still reign in the IT industry and, sadly, in academia too. Most of the material accessible to the practitioner is in what we call “cookbook” form—“recipes” to be followed mechanically without understanding why. This contributes to the poor understanding of normalization, its practical purposes and particularly the severe costs of “denormalization” that are almost completely ignored.
Instead of focusing on the real solution to performance maximization—better implementations of truly relational DBMSs (TRDBMS) and physical design—the industry distracts practitioners who, like vendors and the trade media, are mired in the logical-physical confusion, by encouraging or compelling them to “denormalize for performance”, to the consequences of which all are oblivious.
Correct business modeling and logical design yield databases that adhere to the Principle of Full Normalization (POFN) and the Principle of Orthogonal Design (POOD) and benefit from their practical advantages. A normalization procedure is required if and only if database design is flawed—it is, in other words, a design repair endeavor.
The
focus of this paper is on understanding normalization and its
practical benefits, and denormalization and its costs. It
Ø
Explains
normal forms
Ø
Documents
all costs of undernormalization
Ø
Exposes
“denormalization for performance” as an illusion and explains why it is
sometimes induced by DBMS products
Ø
Debunks
some prevalent misconceptions
TABLE OF CONTENTS
·
Introduction
·
Understanding Normalization
·
The Full Price of Undernormalization
·
Further Normalization
·
"The Whole Key" & Second Normal Form
·
"Nothing But the Key" & Third Normal Form
·
General Dependencies
·
Multivalued Dependencies & Fourth Normal Form
·
Join Dependencies and Fifth Normal Form
·
Two Database Design Principles
·
"Denormalization for Performance"
·
The Logical-Physical Confusion
·
Database Bias
·
Redundancy Control
·
The Real Problem and Solution
·
Some Fallacies Debunked
·
Conclusions and Recommendations
·
Appendix A: Boyce-Codd Normal Form
·
References
USE OF MATERIAL POLICY
PRICING AND ORDERING
Counter by Digits