Series 2: UNDERSTANDING THE REAL RDM
The objectives of these papers are:
- To offer the data practitioner an accessible informal interpretation of David McGoveran's formal exposition of the real RDM envisioned by E. F. Codd.
- To contrast it with the current common interpretation that emerged after EFC's passing and to demonstrate the practical implications of the differences.
#1. THE INTERPRETATION AND REPRESENTATION OF DATABASE RELATIONS V.1 (April 2017)
The first paper in the new series interprets, clarifies and discusses the structural component of EFC's RDM as introduced in his initial 1969-70 papers and covers:
- The interpretation of database relations
- The representation of database relations
- Database normalization
Table of Contents
1. The Interpretation of Database Relations
1.1. Attributes as Constrained Domains
1.2. Time-Varying Relations
2. Representation of Database Relations
2.1. Physical Independence
2.1.1. Uniquely Named Attributes
2.1.2. Primary Keys
2.1.3. Relations and R-tables
3. Database Normalization
3.1. Simple Domains and Normal Form
3.2. Non-simple Domains and Normalization
3.2.1. Foreign Keys
Series 1: PRACTICAL DATABASE FOUNDATIONS
(Papers #1 and #2 are pre-requisites for all others.)
- To make true data science--as distinct from what is hyped as such--accessible, without losing theoretical rigor;
- To dispel common, entrenched fallacies about data and relational fundamentals;
- To clarify important aspects of data management that are systematically ignored, misunderstood, misused and abused;
- To expose the practical implications of theory;
- for the data professional and user who thinks critically and independently, rather than operates in the IT industry's "cookbook mode".
#1. BUSINESS MODELING FOR DATABASE DESIGN: FORMALIZING THE INFORMAL V.4 (May 2015)
Few data management aspects are as misunderstood and abused as business modeling and database design. There is broad confusion about various types of model and levels of information representation; and poor knowledge about the relational data model vs. the alternatives, the practical implications thereof and data fundamentals in general.
This is an introduction to the foundation knowledge critical for business modeling for database design and the formalization of informal business models into logical database models that can be computerized and managed by a DBMS.
It explains in accessible language:
- Conceptual, logical, physical and data models;
- Levels of representation;
- Data independence;
- The relational data model.
Table of Contents
1. Business Modeling
1.1. Basic Modeling Concepts
1.2. Business Rules
1.2.1. Property Rules
1.2.2. Class Rules
1.2.3. Associative Entities
1.3. Business Models
2. Database Design
2.1. Formalizing the Informal
2.2. Predicates and Propositions
2.3. The Relational Data Model
2.3.1. Relational Structure
2.3.2 Relational Integrity
2.3.3. Relational Manipulation
2.4. Logical Models
3. Understanding Database Management
3.1. Note on missing values
3.2. A Foundation Framework
Appendix A: Constraint Formulation and Verification
Appendix B: Integrity Constraints in Dataphor’s D4
Appendix C: Some Misconceptions Debunked
#2. THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE v.4 (May 2015)
A core database design principle is the Principle of Full Normalization (PFN). Database designs that do not adhere to it present certain practical drawbacks for data manipulation, integrity enforcement and, consequently, for the correct manipulation of data and interpretation of results. Despite the plethora of information on the subject (not all of it correct, or well explicated), the subject is still poorly understood.
Paper #1 in this series, Business Modeling for Database Design, outlines a methodology that implicitly produces fully normalized databases. But due either to inadvertent errors, or to intentional “denormalization for performance”, PFN violations occur frequently. They impose considerable and insidious costs to which many data professionals are oblivious. Data redundancy and the risk of inconsistent databases is only one of them, albeit a major one.
Explicit further normalization should be necessary only for database design repair, when databases were poorly designed, to eliminate the drawbacks.
This paper explains in easy to understand language:
- The kinds of PFN violation possible;
- The undesirable properties of PFN violations and their costs;
- How to repair the design and eliminate the drawbacks;
- Why denormalization for performance is a dangerous illusion.
Table of Contents
1. R-tables, Keys and Dependencies
2. Normalization and Normal Forms
3. Further Normalization As Design Repair
3.1. Join Dependencies
3.2. “The Whole Key” and 2NF
3.3. “Nothing But the Key” and 3NF
3.4. “The Whole Key” and BCNF
3.5. Multivalued Dependencies and 4NF
3.6. Interval Data and 6NF
4. “Denormalization For Performance”
4.1. The Logical-Physical Confusion
4.2. Redundancy Control
4.3. JDC’s and SQL
4.4. The Real Problem and Solution
5. Conclusion and Recommendations
#3. THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA V.4 (May 2015)
The relational data model (RDM) is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data are missing—which means that we don't always know whether certain propositions are true or not. This violates 2VL and database query results are no longer guaranteed to be provably logically correct with respect to the real world.
Missing data has possibly been the thorniest aspect of database management. Without a logically sound yet practical solution, data professionals and users are between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULL’s and risk results that are erroneous in ways hard to discern or easy to misinterpret, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.
This paper illustrates some of the drawbacks of the many-valued logic (nVL, n>2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
- Guarantees data integrity and logically correct query results;
- Avoids the complications and problematics of nVL/NULL's;
- Requires no changes to the relational model;
- Is largely transparent to users;
- Keeps users better apprised of the existence and effects of missing data.
The proposed solution requires research into its implications for data manipulation and integrity enforcement, but we believe it is theoretically sound and implementable in a TRDBMS using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ implementation model (TRIM).
Table of Contents
1. “Inapplicable Data”: Nothing's Missing
2. Missing Data: Into the Unknown
3. SQL’s NULL: What-Valued Logic?
4. Known Unknowns: Metadata
5. A 2VL Relational Solution
5.1. The Practicality of Theory
5.2. 2VL vs. NULL’s in the Real World
5.3. Relation Proliferation
Appendix A: What’s Wrong with this Picture?
1. "Not Complicated"
2. "Part of the Real World"
3. "Integral Part of Relational Databases"
4. "Throw a Damn Exception"
Appendix B: Comments on the Proposed Solution
#4. THE KEY TO KEYS: A MATTER OF IDENTITY V.2 (May 2015)
Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design and #2, The Costly Illusion: Normalization, Integrity and Performance, in this series, which are both recommended as preamble.
If entities in the real world did not have identifiers—attributes that capture their identity and uniquely identify them—we would not be able to tell them apart. It follows that an accurate database representation of a business reality must include keys, which formally represent informal the real world identifiers in the database.
Keys and the types thereof, their necessity, key selection, function and properties are often not well known and understood.
- Defines and explains the key concept;
- Explains the function and properties of the various types of key;
- Describes the criteria for key selection;
- Specifies what is proper DBMS key support;
- Assesses SQL's key support;
- Debunks some common misconceptions about keys.
Table of Contents
1. R-tables and Integrity Constraints
2. Keys and Key Constraints
3. Kinds of Keys
3.1. Candidate and Primary Keys
3.2. Natural Keys
3.3. Simple and Composite Keys
3.4. Foreign and Surrogate Keys
4. Key Functions
4.1. Duplicate Prevention
4.2. Guaranteed Logical Access
4.3. Low Integrity Burden
4.4. View Updatability and Logical Data Independence
5. DBMS Key Support
6. Keys in SQL
6.1. SQL and Duplicates
Appendix A: Duplicate Removal in SQL
Appendix B: Duplicates and Language Redundancy
#5. TRULY RELATIONAL: WHAT IT REALLY MEANS V.3 (May 2015)
In the first paragraph of his first published exposition of the relational data model for database management in 1969, E.F. Codd claimed three core advantages:
- A dual sound theoretical foundation: predicate logic and set mathematics;
- Physical data independence;
- DBMS-guaranteed data integrity and provably logically correct query results with respect to the real world.
Yet from 1969 to date the industry has failed to implement Codd’s ideas truly and fully. The closest it came to the relational model are SQL-based DBMS's that have only limited relational fidelity and violate the model in multiple ways. Moreover, instead of correcting mistakes, vendors—including IBM, where the relational model was invented and Oracle, the first implementer of a SQL DBMS—have regressed to the very costly and unproductive approaches that Codd’s innovation made obsolete forty-five years ago.
This paper revisits Codd’s ideas in his seminal first two papers, one being an important public revision of the other (an internal IBM document), containing changes and new material. It
Reasserts those aspects that have been ignored
Recalls those that were missed
Clarifies those that are opaque
Corrects misinterpretations as well as original mistakes
Settles some current disagreements on and confusion over what the relational model really is.
Table of Contents
1. Relations on Domains
2. Relation Representation
3. Time-Varying Relations
4. Relation Interpretation
5. Data Sub-language
6. Atomicity, Nested Relations, and Normalization
7. Foreign Keys and (First) Normal Form
8. Operations on Relations
9. Kinds of Relations
10. Derivability, Redundancy, Consistency
Appendix A: Codd’s 1969 Relational Operators
Appendix B: Debunking Misconceptions
#6. DOMAINS: THE DATABASE GLUE V.2 (January 2015)
Domains are a fundamental database feature. Mathematical relations are defined on domains—they are subsets of Cartesian products of domains. In other words, no domains, no relations. Codd, the inventor of the relational data model (RDM), referred to them as the "glue that holds the database together"—only the values of columns defined on the same domains are meaningfully comparable e.g. for joins, because they represent the same attributes—they mean the same thing.
Yet they are one of the least understood database features. This is both a cause and a consequence of lack of domain support in SQL—both the standard and commercial implementations. The consequences, such as, for example, lack or poor support of user-defined domains of arbitrarily complexity, are erroneously blamed on the RDM when, in fact, domains are orthogonal to the model, which places no restrictions on them whatsoever.
This paper covers
- The domain concept;
- Distinctions from data type;
- Kinds of domains;
- SQL domain support;
- Implications for database practice and DBMS design.
Table of Contents
1. Domains and Types
1.1. Meaning and Representation
2. Kinds of Domains
2.1. “Simple” Domains
2.2. “Complex” Domains
2.3. User-Defined Domains and System-Defined Types
3. Domains and SQL
4. Some Practical Implications
4.1. “Universal” DBMS
4.2. Database Design
4.5 Tackling Complexity
1 paper $ 25.00
2 papers $ 45.00
3 papers $ 65.00
4 papers $ 85.00
5 papers $105.00
6 papers $125.00
Contact us for volume discounts.
Updates and revisions
- Same year revisions and new versions are free.
- Next year versions are free.
- Next year new versions are half price.
How to order
Check or money order: Preferred. Contact us for mailing address.
- Select the number of papers;
- Click on the Buy Now button;
- Login to your Paypal account and pay the amount;
- Enter the #s of the papers you ordered;
We appreciate your support, which keeps this site free. Thank you.
Founder, Editor, Publisher and Debunker-in-Chief