THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE
Fabian Pascal PDBF Paper #3v2 (November 2005)

 

 

 


This is an expanded and revised version of v. 1. It relies on terminology and concepts developed in the recently published Conceptual Modeling and Database Design: A Foundation Framework for Data Management (referred to as paper #2 for short), which is strongly recommended as a preamble.


 

ABSTRACT

 

There are tons of material on normalization, mostly against it and outright wrong, but despite its having been thoroughly debunked, confusion and poor understanding still reign in the industry and, sadly, in academia too. Some of the material is either purely formal and inaccessible to the average practitioner, or in what we call “cookbook” form—“recipes” to be followed mechanically, without understanding why. Neither approach relates to the other, and both contribute to the poor understanding of normalization, its practical purpose and advantages, and particularly the severe costs of less than full normalization, which are almost completely ignored.

 

Instead of focusing on the real solution to performance maximization—better implementations of truly relational DBMSs (TRDBMS)—the industry distracts practitioners who are, like vendors and the trade media, mired in the logical-physical confusion, by encouraging or compelling them to “denormalize for performance”, to the costly consequences of which all are oblivious.

 

Furthermore, as we show in Conceptual Modeling and Database Design: A Foundation Framework for Data Management, given proper conceptual modeling, accurate logical design yields databases that adhere to the Principle of Full Normalization, and the Principle of Orthogonal Design, benefiting from the practical advantages deriving from both. In fact, a normalization process is required if and only if database design s flawed—it is, in other words, a design repair endeavor.

 

The focus of this paper is, by contrast, on understanding normalization, its practical benefits, and the heavy costs of violating it. It

 

Ø       explains normal forms

Ø       documents all costs of less than full normalization

Ø       exposes “denormalization for performance” as an illusion, and explains why it is sometimes imposed by DBMS products

Ø       debunks some prevalent misconceptions, to demonstrate how proper knowledge of data fundamentals is necessary to see through industry poor practices and erroneous pronouncements, and to avoid costly mistakes.

 

 

TABLE OF CONTENTS

 

·    INTRODUCTION

·    UNDERSTANDING NORMALIZATION

·    THE FULL PRICE OF LESS THAN FULL NORMALIZATION

·    FURTHER NORMALIZATION

·     “THE WHOLE KEY”: SECOND NORMAL FORM

·    “NOTHING BUT THE KEY”: THIRD NORMAL FORM

·    GENERALIZED DEPENDENCIES

·    MULTIVALUED DEPENDENCIES: FOURTH NORMAL FORM

·    JOIN DEPENDENCIES: FIFTH NORMAL FORM

·    TWO DATABASE DESIGN PRINCIPLES

·     “DENORMALIZATION FOR PERFORMANCE”

·    THE LOGICAL-PHYSICAL CONFUSION

·    DATABASE BIAS

·    REDUNDANCY CONTROL

·    THE REAL PROBLEM AND SOLUTION

·    SOME FALLACIES DEBUNKED

·    DENORMALIZATION “GUIDELINES”

·    WHERE’S THE (CONCEPTUAL) BEEF?

·    ABNORMAL VIEW OF NORMAL FORMS

·    “USER-CENTERED” DESIGN

·    “OVERNORMALIZATION”

·    SCHEMA HARDLY A STAR

·    CONCLUSION AND RECOMMENDATIONS

·    APPENDIX A: BOYCE-CODD NORMAL FORM

·    REFERENCES

 

 

Use of Materials Policy

 

 

PRICING AND ORDERING