PRACTICAL DATABASE FOUNDATIONS PAPERS
by Fabian Pascal
The PRACTICAL DATABASE FOUNDATIONS series are a set of originally written papers, the objective of which is to make the scientific foundation of database management and its practical implications accessible to the data practitioner and user.
They cover subjects that are not addressed, or are distorted by the IT industry, are not sufficiently or properly clarified and/or are misunderstood.
Their objective is to stay as close to and be as accurate about the formal foundations of the field as possible, while at the same time use simple, readily understood language. The examples from real practice provided at the end of each paper can be used by the reader to test himself on his comprehension of the subject matter.
NEW! THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE v.5 (February, 2013; revised April 2013) NEW!
A core database design principle is the Principle of Full Normalization (PFN). Database designs that do not adhere to the PFN present certain drawbacks for relational manipulation, integrity enforcement and usability and, consequently, for application development and data accessibility. Despite the plethora of information on the subject (not all of it correct, or well explicated), the subject is still poorly understood.
The first paper in this series, Business Modeling for Database Design, outlines a methodology that implicitly produces fully normalized databases. But because business modeling is informal, ad-hoc and based on subjective perceptions of reality, violations of the PFN, either due to inadvertent errors, or intentional “denormalization for performance”, due to the mistaken notion that logical database design, rather than physical implementation factors, is responsible for poor performance.
PFN violations impose considerable costs that are mostly ignored. Data redundancy is only one of them, albeit a major one. It increases the integrity burden for both developers and the DBMS. What is more, if they undertake the burden to control redundancy, not only will performance gains from denormalization fail materialize, but may actually degrade performance.
This paper presents normalization as a method of database design repair. It is required only if and when databases are poorly designed as a way to eliminate the drawbacks caused by PFN violations.
It explains in easy to understand language:
- The kinds of PFN violation possible;
- The undesirable consequences of PFN violations;
- How to repair designs and eliminate the drawbacks;
Table of Contents
- R-tables, Keys and Dependencies
- Normalization and Normal Forms
- Further Normalization As Design Repair
- Join Dependencies
- “The Whole Key” and 2NF
- “Nothing But the Key” and 3NF
- “The Whole Key” and BCNF
- Multivalued Dependencies and 4NF
- Interval Data and 6NF
- “Denormalization For Performance”
- The Logical-Physical Confusion
- Redundancy Control
- A Note on JDC’s and SQL
- The Real Problem and Solution
- Some Fallacies Debunked
- Normal Forms
- Conclusions and Recommendations
NEW! BUSINESS MODELING FOR DATABASE DESIGN v.5 (Aug. 2012) NEW!
The purpose of this paper is to introduce practitioners to data fundamentals via the process of business modeling for database design. It focuses particularly on and distinguishes between the three types of model that are often confused: business model, logical model and data model.
The objective is understanding, as distinct from the "cookbook approach"
· Business Modeling
· Basic Modeling Concepts
· Business Rules
· Property Business Rules
· Class Business Rules
· Attribute Rules
· Distinctness Rules
· Functional Dependency Rules
· Referential Rules
· Arbitrary Rules
· Business Models
· Database Design
· Predicates and Propositions
· The Relational Data Model
· Logical Models
· Understanding Database Management
· Truth vs. Correctness
· Database and DBMS Defined
· A Foundation Framework
· Applying the Framework: Exercises
TRULY RELATIONAL-WHAT IT REALLY MEANS v.2 (Jan. 2011)
In the first paragraph of his first ever published exposition of the relational idea 42 years ago, E. F. Codd claimed three critical advantages of a relational model of data:
- A sound theoretical basis—logic and mathematics—for database management
- Physical data independence
- DBMS-guaranteed integrity of data and query results
Yet from 1969 to date the industry has failed to concretize Codd’s ideas by implementing SQL-based products that have limited relational fidelity and violate many relational principles. Moreover, instead of correcting the mistakes, the industry has been regressing to the same costly and unproductive technologies made obsolete by Codd’s innovation more than
thirty years ago.
Much of this is due to an educational failure: vendors, experts, users and the trade press do not know, understand and appreciate the practical value of Codd’s contribution and the huge cost of ignoring it. Young generations of practitioners are not even introduced to the relational model and are instilled either with the notion that SQL products are relational, or that relational technology is obsolete. Driven by industry rather than principles, academia has renounced its true function of education and is serving as product trainer for vendors.
It is therefore imperative—and proper to start this series, intended to make data fundamentals accessible to practitioners—to revisit Codd’s original work, reassert those aspects that have been ignored, recall those that were missed, clarify those that are opaque, correct misinterpretations as well as original mistakes and settle current disagreements and confusion over what the relational model really is.
This paper covers Codd’s seminal first two papers, Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks (1969), and A Relational Model of Data for Large Shared Data Banks (1970), the latter being an important public revision of the former (an internal IBM document), that contained changes and introduced new material.
· Relations on Domains
· Relation Representation
· Time-varying Relations
· Relation Interpretation
· Data Sublanguage
· Atomicity, Nested Relations and Normalization
· Foreign Keys and (First) Normal Form
· Operations on Relations
· Kinds of Relations
· Derivability, Redundancy, Consistency
· Debunking Misconceptions
THE FINAL NULL IN THE COFFIN v.2 (Jan.2011)
As attested to by the volume of writings and the heat of the debate on the subject, the treatment of missing data has possibly been one of the thorniest aspects of database management. Users are left between a rock and a hard place: they can either rely on SQL' s problematic version of three-valued logic based on NULLs and risk hard to interpret database answers and/or hard to detect errors in integrity enforcement and query results, or undertake the prohibitive burden in applications of what is a complex database function that belongs in the DBMS.
This paper summarizes the drawbacks of the many-valued logic approach to missing data, and SQL’s poorly implemented flavor of three-valued logic via NULLs.
It proposes a possible solution within the two-valued logic/relational framework.
- Separates unknown and therefore missing data from “inapplicable” and therefore non-missing data), and provides proper design guidelines to avoid the latter
- Treats missing data correctly as metadata
- Gurantees logically correct answers with respect to the real world, without the complications and problematics of many-valued logic and SQL’s NULLs.
· The Logic of the Real World
· "Inapplicable Values": A Red Herring
· Into the Unknown: Three-Valued Logic
· Not of This World: SQL's NULL
· Assert Only What You Know
· Known Unknowns: Data About Data
· SQL vs. 2VL: A Real World Comparison
· "Too Many" R-Tables?
· Some Misconceptions Debunked
· Concluding Remarks
USE OF MATERIAL POLICY
Publisher hereby grants to purchaser non-exclusive, non-transferable, revocable rights to use the materials for non-commercial purposes as follows:
Authorized Users: Individuals who purchase the materials for their own exclusive use;
Authorized Uses: Authorized users may use the materials as is consistent with the Fair Use Provisions of United States and international law, for non-commercial purposes as follows:
- Individual buyers have exclusive access to the materials;
- Academic buyers may use the material, with appropriate credit, in the preparation of course packs or other non-commercial, educational materials;
- Authorized may use, with appropriate credit, figures, tables and brief excerpts in the user's own scientific, scholarly and educational works;
- Institutional volume buyers should check with Publisher on authorized use;
Unauthorized Use: Authorized users shall not
- Knowingly permit others to use the materials;
- Modify or create a derivative work without prior written permission of Publisher;
- Remove, obscure or modify any copyright or other notice;
- Use the materials for commercial purposes, including but not limited to sale, reproduction, or distribution in any form;
Single paper rates
Order all 4 papers and get a two-paper combo free:
WHAT FIRST NORMAL FORM REALLY MEANS by C. J. Date
WHAT FIRST NORMAL FORM MEANS NOT by F. Pascal
UPDATES AND NEW VERSIONS
Updates are free to buyers of the original version, and can be downloaded using the same password. New versions require new purchase and password.
HOW TO ORDER
We prefer payments in US$ via PayPal. For check/money order payments email us at the email address on the About page.
- Select the paper(s) to order;
- Calculate the total order price;
- Specify the papers you order in the text field below;
- Click on the Buy Now button;
- At the payment form, either login to your Paypal account, or pay by credit card;
- Upon notification of payment by PayPal, we will email download instructions.
Address any questions or payment/delivery issues to the above address.
We appreciate your support, which keeps this site free.
Founder, Editor and Publisher