PRACTICAL DATABASE FOUNDATIONS SERIES
This series are a set of papers and seminars, the objective of which is to make the scientific foundation of database management and its practical implications accessible to data professionals and users.
They cover aspects that are not addressed, or are distorted in the IT industry, are not sufficiently or properly clarified and/or are misunderstood. They stay as close to and and as accurate about the formal foundations of the database field as possible, while using simple, readily understood language.
REVISED! Paper #1, V. 3 (September 2014) REVISED!
BUSINESS MODELING FOR DATABASE DESIGN
- Business model
- Logical model
- Physical model
- Data model
- explains basic modeling and design concepts
- how to formalize a informal business models via the relational data model into formal logical database models which can be computerized and managed by a DBMS.
Table of Contents
1. Business Modeling
1.1. Basic Modeling Concepts
1.2. Business Rules
1.2.1. Property Rules
1.2.2. Class Rules
1.2.3. Attribute Rules
1.2.4. Identity Rules
1.2.5. Functional Dependency Rules
1.2.6. Referential Rules
1.2.7. Associative Entities
1.2.8. Arbitrary Rules
1.3. Business Models
2. Database Design
2.1. Formalizing the Informal
2.2. Predicates and Propositions
2.3. The Relational Data Model
2.3.1. Relational Structure
2.3.2. Relational Manipulation
2.3.3. Relational Integrity
126.96.36.199. External Predicates
188.8.131.52. Internal Predicates
2.4. Logical Models
3. Understanding Database Management
3.1. Note on missing values
3.2. A Foundation Framework
Appendix A: Constraint Formulation and Verification
Appendix B: Integrity Constraints in Dataphor’s D4
Appendix C: Applying the Framework
REVISED! Paper #2, v. 6 (September 2014) REVISED!
THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE
Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design.
A core database design principle is the Principle of Full Normalization (PFN). Database designs that do not adhere to it present certain practical drawbacks for data manipulation, integrity enforcement and, consequently, for application development and usability. Despite the plethora of information on the subject (not all of it correct, or well explicated), the subject is still poorly understood.
Paper #1 in this series, Business Modeling for Database Design, outlines a methodology that implicitly produces fully normalized databases. But due either to inadvertent errors, or to intentional “denormalization for performance”, PFN violations occur frequently. They impose considerable costs to which many data professionals are oblivious. Data redundancy and the risk of inconsistent databases is only one of them, albeit a major one.
Explicit further normalization should be necessary only for database design repair, when databases were poorly designed, to eliminate the drawbacks.
This paper explains in easy to understand language:
- The kinds of PFN violation possible;
- The costly implications of PFN violations;
- How to repair poor designs and eliminate the drawbacks;
- Why denormalization for performance is an illusion.
Table of Contents
1. R-tables, Keys and Dependencies
2. Normalization and Normal Forms
3. Further Normalization As Design Repair
3.1. Join Dependencies
3.2. “The Whole Key” and 2NF
3.3. “Nothing But the Key” and 3NF
3.4. “The Whole Key” and BCNF
3.5. Multivalued Dependencies and 4NF
3.6. Interval Data and 6NF
4. “Denormalization For Performance”
4.1. The Logical-Physical Confusion
4.2. Redundancy Control
4.3. JDC’s and SQL
4.4. The Real Problem and Solution
5. Conclusion and Recommendations
Paper #3 V.3 (August 2013)
THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATANote: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design.
The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether certain propositions are true or not. Then 2VL no longer applies and database query results are no longer guaranteed to be provably logically correct with respect to the real world.
Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.
This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
- Guarantees data integrity and logically correct query results;
- Avoids the complications and problematics of nVL/NULL's;
- Requires no changes to the relational model;
- Is largely transparent to users;
- Keeps users better apprised of the existence and effects of missing data.
Table of Contents
1. “Inapplicable Data”: Nothing's Missing
2. Missing Data: Into the Unknown
3. SQL’s 3VL: NULL
4. Known Unknowns: Metadata
5. A 2VL Relational Solution
5.1. The Practicality of Theory
5.2. 2VL vs. NULL: A Real World Example
5.3. Relation Proliferation
5.4. The TransRelational™ Implementation Model
Appendix A: Misconceptions Debunked
Paper #4 V.1 (January 2014)
THE KEY TO KEYS: A MATTER OF IDENTITY
- Defines and explains the key concept;
- Explains the function and properties of the various types of key;
- Describes the criteria for key selection;
- Specifies what is proper DBMS key support;
- Assesses SQL's key support;
- Debunks some common misconceptions about keys.
Table of Contents
2. Keys and Key Constraints
3. Kinds of Keys
3.1. Natural Keys
3.1.1. Candidate and Primary Keys
3.1.2. Simple and Composite Keys
3.2. Surrogate Keys
4. Key Functions
4.1. Duplicate prevention
4.2. Integrity Burden Reduction
4.3. View Updatability
5. Foreign Keys and Referential Constraints
6. DBMS Key Support
7. Keys in SQL
7.1. SQL and Duplicates
Appendix A: Duplicate removal in SQL
Appendix B: Duplicates and Language Redundancy
REVISED! Paper #5 V.2 (September 2014) REVISED!
TRULY RELATIONAL - WHAT IT REALLY MEANS:
CODD'S ORIGINAL 1969-70 PAPERS
In the first paragraph of his first ever published exposition of the relational idea 42 years ago, E. F. Codd claimed three critical advantages of a relational model of data:
- A sound theoretical basis—logic and mathematics—for database management
- Physical data independence
- DBMS-guaranteed integrity of data and query results
Yet from 1969 to date the industry has failed to concretize Codd’s ideas by implementing SQL-based products that have limited relational fidelity and violate many relational principles. Moreover, instead of correcting the mistakes, the industry has been regressing to the same costly and unproductive technologies made obsolete by Codd’s innovation more than thirty years ago.
Much of this is due to an educational failure: vendors, experts, users and the trade press do not know, understand and appreciate the practical value of Codd’s contribution and the huge cost of ignoring it. Young generations of practitioners are not even introduced to the relational model and are instilled either with the notion that SQL products are relational, or that relational technology is obsolete. Driven by industry rather than principles, academia has renounced its true function of education and is serving as product trainer for vendors.
It is therefore imperative—and proper to start this series, intended to make data fundamentals accessible to practitioners—to revisit Codd’s original work, reassert those aspects that have been ignored, recall those that were missed, clarify those that are opaque, correct misinterpretations as well as original mistakes and settle current disagreements and confusion over what the relational model really is.
This paper covers Codd’s seminal first two papers, Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks (1969), and A Relational Model of Data for Large Shared Data Banks (1970), the latter being an important public revision of the former (an internal IBM document), that contained changes and introduced new material.
Table of Contents
1. Relations on Domains
2. Relation Representation
3. Time-varying Relations
4. Relation Interpretation
5. Data Sub-language
6. Atomicity, Nested Relations and Normalization
7. Foreign Keys and Normal Form
8. Operations on Relations
9. Kinds of Relations
10. Derivability, Redundancy, Consistency
Appendix A: Debunking Misconceptions
USE OF MATERIAL POLICY
Publisher hereby grants to purchaser non-exclusive, non-transferable, revocable rights to use the materials for non-commercial purposes as follows:
Authorized Users: Individuals who purchase the materials for their own exclusive use;
Authorized Uses: Authorized users may use the materials as is consistent with the Fair Use Provisions of United States and international law, for non-commercial purposes as follows:
- Individual buyers have exclusive access to the materials;
- Academic buyers may use the material, with appropriate credit, in the preparation of course packs or other non-commercial, educational materials;
- Authorized may use, with appropriate credit, figures, tables and brief excerpts in the user's own scientific, scholarly and educational works;
- Institutional volume buyers should check with Publisher on authorized use;
Unauthorized Use: Authorized users shall not
- Knowingly permit others to use the materials;
- Modify or create a derivative work without prior written permission of Publisher;
- Remove, obscure or modify any copyright or other notice;
- Use the materials for commercial purposes, including but not limited to sale, reproduction, or distribution in any form;
Single paper rates
Order all 4 papers and get a two-paper combo free:
WHAT FIRST NORMAL FORM REALLY MEANS by C. J. Date
WHAT FIRST NORMAL FORM MEANS NOT by F. Pascal
UPDATES AND NEW VERSIONS
Updates are free to buyers of the original version. New versions require new purchase.
HOW TO ORDER
We prefer payments in US$ via PayPal. For check/money order payments email us at the email address on the About page.
- Select the paper(s) to order;
- Calculate the total order price;
- Specify the papers you order in the text field below;
- Click on the Buy Now button;
- At the payment form, either login to your Paypal account, or pay by credit card;
- Upon notification of payment by PayPal, we will email download instructions.
Address any questions or payment/delivery issues to the above address.
We appreciate your support, which keeps this site free.
Founder, Editor and Publisher