THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE
Fabian Pascal Paper #3 v.2 (January 2011)


This is an expanded and revised version of this paper and supersedes all previous versions. It relies on terminology and concepts developed in paper #2 Business Modeling for Database Design: A Foundation Framework for Data Management which is strongly recommended as a preamble.


ABSTRACT

There is a plethora of material on normalization, mostly against it and outright wrong, but despite its having been thoroughly debunked, confusion and poor understanding still reign in the IT industry and, sadly, in academia too. Most of the material accessible to the practitioner is in what we call “cookbook” form—“recipes” to be followed mechanically without understanding why. This contributes to the poor understanding of normalization, its practical purposes and particularly the severe costs of “denormalization” that are almost completely ignored.

Instead of focusing on the real solution to performance maximization—better implementations of truly relational DBMSs (TRDBMS) and physical design—the industry distracts practitioners who, like vendors and the trade media, are mired in the logical-physical confusion, by encouraging or compelling them to “denormalize for performance”, to the consequences of which all are oblivious.

Correct business modeling and logical design yield databases that adhere to the Principle of Full Normalization (POFN) and the Principle of Orthogonal Design (POOD) and benefit from their practical advantages. A normalization procedure is required if and only if database design is flawed—it is, in other words, a design repair endeavor.

The focus of this paper is on understanding normalization and its practical benefits, and denormalization and its costs. It

Ø Explains normal forms

Ø Documents all costs of undernormalization

Ø Exposes “denormalization for performance” as an illusion and explains why it is sometimes induced by DBMS products

Ø Debunks some prevalent misconceptions

TABLE OF CONTENTS

· Introduction

· Understanding Normalization

· The Full Price of Undernormalization

· Further Normalization

· "The Whole Key" & Second Normal Form

· "Nothing But the Key" & Third Normal Form

· General Dependencies

· Multivalued Dependencies & Fourth Normal Form

· Join Dependencies and Fifth Normal Form

· Two Database Design Principles

· "Denormalization for Performance"

· The Logical-Physical Confusion

· Database Bias

· Redundancy Control

· The Real Problem and Solution

· Some Fallacies Debunked

· Conclusions and Recommendations

· Appendix A: Boyce-Codd Normal Form

· References

USE OF MATERIAL POLICY

PRICING AND ORDERING

Hit Counter by Digits
Counter by Digits