PRACTICAL DATABASE FOUNDATIONS SERIES
This series are a set of papers and seminars, the objective of which is to make the scientific foundation of database management and its practical implications accessible to data professionals and users.
They cover aspects that are not addressed, or are distorted in the IT industry, are not sufficiently or properly clarified and/or are misunderstood. They stay as close to and and as accurate about the formal foundations of the database field as possible, while using simple, readily understood language.
Paper #1 is a preamble to all subsequent papers.
NEW VERSION! Paper #1, V. 4 (May 2015) NEW VERSION!
BUSINESS MODELING FOR DATABASE DESIGN
This is an introduction to the foundation knowledge critical for business modeling for database design and the formalization of informal business models into logical database models that can be computerized and managed by a DBMS.
It explains in accessible language:
· Conceptual, logical, physical and data models;
· Levels of representation;
· Data independence;
· The relational data model.
Table of Contents
1. Business Modeling
1.1. Basic Modeling Concepts
1.2. Business Rules
1.2.1. Property Rules
1.2.2. Class Rules
1.2.3. Associative Entities
1.3. Business Models
2. Database Design
2.1. Formalizing the Informal
2.2. Predicates and Propositions
2.3. The Relational Data Model
2.3.1. Relational Structure
2.3.2 Relational Integrity
2.3.3. Relational Manipulation
2.4. Logical Models
3. Understanding Database Management
3.1. Note on missing values
3.2. A Foundation Framework
Appendix A: Constraint Formulation and Verification
Appendix B: Integrity Constraints in Dataphor’s D4
Appendix C: What’s Wrong with This Picture?
NEW VERSION! Paper #2, v. 4 (May 2015) NEW VERSION!
THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE
A core database design principle is the Principle of Full Normalization (PFN). Database designs that do not adhere to it present certain practical drawbacks for data manipulation, integrity enforcement and, consequently, for the correct manipulation of data and interpretation of results. Despite the plethora of information on the subject (not all of it correct, or well explicated), the subject is still poorly understood.
Paper #1 in this series, Business Modeling for Database Design, outlines a methodology that implicitly produces fully normalized databases. But due either to inadvertent errors, or to intentional “denormalization for performance”, PFN violations occur frequently. They impose considerable and insidious costs to which many data professionals are oblivious. Data redundancy and the risk of inconsistent databases is only one of them, albeit a major one.
Explicit further normalization should be necessary only for database design repair, when databases were poorly designed, to eliminate the drawbacks.
This paper explains in easy to understand language:
· The kinds of PFN violation possible;
· The undesirable properties of PFN violations and their costs;
· How to repair the design and eliminate the drawbacks;
· Why denormalization for performance is a dangerous illusion.
Table of Contents
1. R-tables, Keys and Dependencies
2. Normalization and Normal Forms
3. Further Normalization As Design Repair
3.1. Join Dependencies
3.2. “The Whole Key” and 2NF
3.3. “Nothing But the Key” and 3NF
3.4. “The Whole Key” and BCNF
3.5. Multivalued Dependencies and 4NF
3.6. Interval Data and 6NF
4. “Denormalization For Performance”
4.1. The Logical-Physical Confusion
4.2. Redundancy Control
4.3. JDC’s and SQL
4.4. The Real Problem and Solution
5. Conclusion and Recommendations
NEW VERSION! Paper #3 V.4 (May 2015) NEW VERSION!
THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA
The relational data model (RDM) is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data are missing—which means that we don't always know whether certain propositions are true or not. This violates 2VL and database query results are no longer guaranteed to be provably logically correct with respect to the real world.
Missing data has possibly been the thorniest aspect of database management. Without a logically sound yet practical solution, data professionals and users are between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULL’s and risk results that are erroneous in ways hard to discern or easy to misinterpret, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.
This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
· Guarantees data integrity and logically correct query results;
· Avoids the complications and problematics of nVL/NULL's;
· Requires no changes to the relational model;
· Is largely transparent to users;
· Keeps users better apprised of the existence and effects of missing data.
The proposed solution requires research into its implications for data manipulation and integrity enforcement, but we believe it is theoretically sound and implementable in a TRDBMS using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ implementation model (TRM).
Table of Contents
1. “Inapplicable Data”: Nothing's Missing
2. Missing Data: Into the Unknown
3. SQL’s NULL: What-Valued Logic?
4. Known Unknowns: Metadata
5. A 2VL Relational Solution
5.1. The Practicality of Theory
5.2. 2VL vs. NULL’s in the Real World
5.3. Relation Proliferation
5.4. TRIM (TM)
Appendix A: What’s Wrong with this Picture?
Appendix B: Comments on the Proposed Solution
NEW VERSION! Paper #4 V.2 (May 2015) NEW VERSION!
THE KEY TO KEYS: A MATTER OF IDENTITY
If entities in the real world did not have identifiers—attributes that capture their identity and uniquely identify them—we would not be able to tell them apart. It follows that an accurate database representation of a business reality must include keys, which formally represent informal the real world identifiers in the database.
Keys and the types thereof, their necessity, key selection, function and properties are often not well known and understood.
· Defines and explains the key concept;
· Explains the function and properties of the various types of key;
· Describes the criteria for key selection;
· Specifies what is proper DBMS key support;
· Assesses SQL's key support;
· Debunks some common misconceptions about keys.
Table of Contents
1. R-tables and Integrity Constraints
2. Keys and Key Constraints
3. Kinds of Keys
3.1. Candidate and Primary Keys
3.2. Natural Keys
3.3. Simple and Composite Keys
3.4. Foreign and Surrogate Keys
4. Key Functions
4.1. Duplicate Prevention
4.2. Guaranteed Logical Access
4.3. Low Integrity Burden
4.4. View Updatability and Logical Data Independence
5. DBMS Key Support
6. Keys in SQL
6.1. SQL and Duplicates
Appendix A: Duplicate Removal in SQL
Appendix B: Duplicates and Language Redundancy
NEW VERSION! Paper #5 V.3 (May 2015) NEW VERSION!
TRULY RELATIONAL: WHAT IT REALLY MEANS
In the first paragraph of his first published exposition of the relational data model for database management in 1969, E.F. Codd claimed three core advantages:
· A dual sound theoretical foundation: predicate logic and set mathematics;
· Physical data independence;
· DBMS-guaranteed data integrity and provably logically correct query results with respect to the real world.
Yet from 1969 to date the industry has failed to implement Codd’s ideas truly and fully. The closest it came to the relational model are SQL-based DBMS's that have only limited relational fidelity and violate the model in multiple ways. Moreover, instead of correcting mistakes, vendors—including IBM, where the relational model was invented and Oracle, the first implementer of a SQL DBMS—have regressed to the very costly and unproductive approaches that Codd’s innovation made obsolete forty-five years ago.
This paper revisits Codd’s ideas in his seminal first two papers, one being an important public revision of the other (an internal IBM document), containing changes and new material. It
· reasserts those aspects that have been ignored
· recalls those that were missed
· clarifies those that are opaque
· corrects misinterpretations as well as original mistakes
· settles some current disagreements on and confusion over what the relational model really is.
Table of Contents
1. Relations on Domains
2. Relation Representation
3. Time-Varying Relations
4. Relation Interpretation
5. Data Sublanguage
6. Atomicity, Nested Relations, and Normalization
6. Foreign Keys and (First) Normal Form
7. Operations on Relations
8. Kinds of Relations
9. Derivability, Redundancy, Consistency
Appendix A: Codd’s 1969 Relational Operators
Appendix B:What's Wrong with This Picture?
NEW VERSION! Paper #6, V.2 (January 2015) NEW VERSION!
DOMAINS: THE DATABASE GLUE
Yet they are one of the least understood database features. This is both a cause and a consequence of lack of domain support in SQL—both the standard and commercial implementations. The consequences, such as, for example, lack or poor support of user-defined domains of arbitrarily complexity, are erroneously blamed on the RDM when, in fact, domains are orthogonal to the model, which places no restrictions on them whatsoever.
This paper covers
· The domain concept;
· Distinctions from data type;
· Kinds of domains;
· SQL domain support;
· Implications for database practice and DBMS design.
Table of Contents
1. Domains and Types
1.1. Meaning and Representation
2. Kinds of Domains
2.1. “Simple” Domains
2.2. “Complex” Domains
2.3. User-Defined Domains and System-Defined Types
3. Domains and SQL
4. Some Practical Implications
4.1. “Universal” DBMS
4.2. Database Design
4.5 Tackling Complexity
USE OF MATERIAL POLICY
Publisher hereby grants to purchaser non-exclusive, non-transferable, revocable rights to use the materials for personal, non-commercial purposes, as follows:
Authorized Users: Individuals who purchase the materials for their own exclusive use;
Authorized Uses: Authorized users may use the materials as is consistent with the Fair Use Provisions of United States and international law, for non-commercial purposes as follows:
- Academic buyers may use the material, with appropriate credit, in the preparation of course packs or other non-commercial, educational materials;
- Authorized users may use, with appropriate credit, figures, tables and brief excerpts in the user's own scientific, scholarly and educational works;
- Individual buyers have exclusive access to the materials;
- Institutional volume buyers should check with Publisher on authorized use;
Specific Restrictions on Use of Materials
Unauthorized Use: Authorized users shall not
- Knowingly permit others to use the materials;
- Modify or create a derivative work without prior written permission of Publisher;
- Remove, obscure or modify any copyright or other notice;
- Use the materials for commercial purposes, including but not limited to sale, reproduction, or distribution in any form.
Single paper rates
Order all papers and get a two-paper combo free bonus:
- WHAT FIRST NORMAL FORM REALLY MEANS by C. J. Date
- WHAT FIRST NORMAL FORM MEANS NOT by F. Pascal
UPDATES AND NEW VERSIONSRevisions are free.
- Free to buyers in the same year.
- Preceding year buyers get 50% discount.
HOW TO ORDER
We prefer payments in US$ via PayPal. For check/money order payments email us at the address on the ABOUT page.
- Select the paper(s) to order;
- Calculate the total order price;
- Specify the papers you order in the text field below;
- Click on the Buy Now button;
- At the payment form, login to your Paypal account;
- Upon notification of payment by PayPal, we will email the paper(s).
Address any questions or payment/delivery issues to the above address.
We appreciate your support, which keeps this site free.
Founder, Editor, Publisher and Debunker-in-Chief