PRACTICAL DATABASE FOUNDATIONS Series
(Papers #1 and #2 are pre-requisites for all others.)
The objectives of these papers are:
- To make true data science--as distinct from what is hyped as such--accessible, without losing theoretical rigor;
- To dispel common, entrenched fallacies about data and relational fundamentals;
- To clarify important aspects of data management that are systematically ignored, misunderstood, misused and abused;
- To expose the practical implications of theory.
- for the data professional and user who thinks critically and independently, rather than operates in the IT industry's "cookbook mode".
#1. BUSINESS MODELING FOR DATABASE DESIGN: FORMALIZING THE INFORMAL V.4 (May 2015)
Few data management aspects are as misunderstood as business modeling and database design. There is entrenched, persistent and broad confusion about levels of information representation and the type of model that corresponds to each:
- Conceptual (business) model;
- Logical (database) model;
- Physical (implementation) model;
- Data model.
- Basic modeling concepts
- Business rules
- Business models
- Formalizing the informal
- Predicates and propositions
- Data structure
- Data integrity
- Data manipulation
- Logical models
- Constraint Formulation and Verification
- Missing data
- A foundation framework
#2. THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE v.4 (May 2015)
The methodology described in paper #1 implicitly produces "fully" normalized (5NF) databases. But poor practices, including “denormalization for performance”, frequently result in poorly designed and costly databases. Designs that do not adhere the Principle of Full Normalization (POFN) present drawbacks for integrity enforcement and data manipulation and raise the risk of inconsistent databases and incorrect or misinterpreted query results, but few data professionals know is that only fully normalized databases guarantee semantic correctness i.e., no database update anomalies. Explicit normalization (to 1NF) and further normalization (to 5NF) are necessary only to repair poorly designed databases and eliminate the drawbacks.
Relations, Keys and Dependencies
Normalization and first normal form (1NF)
- Further Normalization and Higher Normal Forms
- Single-class relations and functional dependencies
- “The whole key” and second normal form (2NF)
- “Nothing but the key” and third normal form (3NF)
- “The whole key” and Boyce-Codd normal form (BCNF)
- Multivalued Dependencies and fourth normal form (4NF)
- Sixth normal form (6NF)
- Logical-Physical Confusion
- Redundancy Control
- Join dependency constraints and SQL
- The costly illusion
#3. THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA V.4 (May 2015)
Missing data have arguably proven to be the thorniest aspect of database management.
The RDM is based on the two-valued logic (2VL) of the real world: every proposition is unequivocally true or false. This enables relational RDBMSs to guarantee logical correctness of query results. What we record in databases, though, is not the real world, but our knowledge of it, which is usually imperfect--some data are missing--in which case and we don't know whether propositions are true or not. This violation of 2VL and RDM defeats the guarantee.
The IT industry (a) relies on SQL's arbitrary and flawed implementations of "three and a half-valued logic" based on NULL, that complicate the formulation of constraints and queries, the interpretation of query results and can produce incorrect results with respect to the real world, of which data professionals and users are commonly unaware of, or oblivious to.
This paper proposes a solution to missing data based on 2VL and, therefore, consistent with the RDM that
- Avoids the problems of many-valued logics (nVL where n > 2) and SQL's "3VL";Requires no changes to the RDM;
- Is largely transparent to users, while keeping them apprised of the effects of missing data.
Inapplicable Data: Nothing's Missing
Missing Data: Into the Unknown
SQL’s NULL: What-Valued Logic?
Known Unknowns: Metadata
A 2VL Relational Solution
The Practicality of Theory
2VL vs. NULL’s in the Real World
Comments on the Proposed Solution
#4. THE KEY TO KEYS: A MATTER OF IDENTITY V.2 (May 2015)
Entities are distinguishable in the real world by some identifier, or we would not be able to tell them apart, or count them. For accurate database representation of reality, relations must have at least one candidate key (CK)--an attribute that represents the identifier. If there are multiple identifiers, one key is selected as the primary key (PK) for referential purposes.
But the types of key, their function, necessity, properties and PK selection criteria are not well known and understood--one of the most frequent hits on this site is driven by the search "Are keys necessary?"
This paper explains
- The key concept;
- Database functions, necessity and properties of keys;
- Criteria for PK selection;
- DBMS key support;
- Keys in SQL.
Relations and Integrity Constraints
Keys and Key Constraints
Kinds of Keys
- Candidate and Primary Keys
- Natural Keys
- Simple and Composite Keys
- Foreign and Surrogate Keys
- Duplicate Prevention
- Guaranteed Logical Access
- Reduction of Integrity Burden
- View Updatability and Logical Independence (LI)
Keys in SQL
#5. TRULY RELATIONAL: WHAT IT REALLY MEANS V.3 (May 2015)
In the first paragraph of his 1969 initial exposition of the relational data model (RDM) for database management, E.F. Codd claimed two core advantages (among others) conferred on database practice by its dual theoretical foundation--predicate logic and set theory:
- Physical data independence (PDI);
- Guaranteed logically correct query results.
This paper revisits Codd’s original ideas in his seminal first two papers:
- Reasserts those aspects that have been ignored;
- Recalls those that were missed;
- Makes accessible those that are opaque;
- Corrects misinterpretations as well as mistakes.
Relations on Domains
Atomicity, Nested Relations, and Normalization
Foreign Keys and (First) Normal Form
Operations on Relations
Kinds of Relations
Derivability, Redundancy, Consistency
#6. DOMAINS: THE DATABASE GLUE V.2 (January 2015)
A relation is defined on domains (in mathematics it is a subset of a Cartesian product of domains). In other words, no domains, no database relations and no relational databases. E. F. Codd, the inventor of the relational data model (RDM), referred to them as the "glue that holds the database together": only the values of attributes defined on the same domain are meaningfully comparable--those attributes represent the same real world property (mean the same thing).
Yet domains are the least understood database feature. This is both a cause and a consequence of lack of domain support by both ANSI/ISO SQL and SQL DBMS's. The subversion of domain value atomicity that is possible in SQL violates the core relational Information Principle. The inability to define custom domains of arbitrary complexity is erroneously blamed on the RDM when the data model places no restrictions on that whatsoever--indeed, domains of arbitrary complexity are one of the RDM's major advantages--the culprit is the poor relational fidelity of SQL and its implementations.
This paper explains
- The domain concept;
- Distinctions from programming data type;
- Kinds of domain;
- SQL domain support;
- Implications for database practice.
Domains and Types
Meaning and Representation
Kinds of Domain
- “Universal” DBMS;
- Database Design;
- Object DBMS;
- Tackling Complexity.
1 paper $ 25.00
2 papers $ 45.00
3 papers $ 65.00
4 papers $ 85.00
5 papers $105.00
6 papers $125.00
Contact us for volume discounts.
Updates and revisions
- Same year revisions and new versions are free.
- Next year versions are free.
- Next year new versions are half price.
How to order
Check or money order: Preferred. Contact us for mailing address.
- Select the number of papers;
- Click on the Buy Now button;
- Login to your Paypal account and pay the amount;
- Enter the #s of the papers you ordered;
We appreciate your support, which keeps this site free. Thank you.
Founder, Editor, Publisher and Debunker-in-Chief