PRACTICAL DATABASE FOUNDATIONS SERIES
(Order papers below; For on-site seminars in this series, contact us via the ABOUT page .)
The objectives of these papers are:
- To make the true data science--as distinct from what is hyped as such--accessible, without losing theoretical rigor;
- To dispel common, entrenched and persistent misconceptions about data, database and relational fundamentals;
- To discuss important aspects of data management that are systematically ignored, misunderstood, misused and abused;
- To expose the practical implications of theory
(Papers #1 and #5 are pre-requisites for all others.)
#1. BUSINESS MODELING FOR DATABASE DESIGN: FORMALIZING THE INFORMALV. 4 (May 2015)
Few data management aspects are as misunderstood as business modeling and database design. There is entrenched, persistent and broad confusion about levels of information representation and the type of model that corresponds to each:
- Conceptual (business) model;
- Logical (database) model;
- Physical (implementation) model;
- Data model.
This paper is an introduction to the science underlying database design, the advantages of adhering it and the cost of disregarding or misusing it. It offers
- A rigorous and sound methodology for database practice;
- A foundation framework within which to evaluate claims, technologies, products and practices and avoid detect and correct problems.
- Basic modeling concepts
- Business rules
- Business models
- Formalizing the informal
- Predicates and propositions
The relational data model
- Data structure
- Data integrity
- Data manipulation
- Logical models
Understanding Database Management
- Constraint Formulation and Verification
- Missing data
- A foundation framework
- Misconceptions debunked
#2. THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE
v. 4 (May 2015)
The methodology described in paper #1 implicitly produces "fully" normalized (5NF) databases. But poor practices, including “denormalization for performance”, frequently result in poorly designed and costly databases.
Designs that do not adhere the Principle of Full Normalization (POFN) present drawbacks for integrity enforcement and data manipulation and raise the risk of inconsistent databases and incorrect or misinterpreted query results, but few data professionals know is that only fully normalized databases guarantee semantic correctness i.e., no database update anomalies.
Explicit normalization (to 1NF) and further normalization (to 5NF) are necessary only to repair poorly designed databases and eliminate the drawbacks.
This paper explains
- The kinds of possible deviations from POFN;
- The undesirable implications and cost thereof;
- How to repair poor designs and eliminate the drawbacks;
- Why denormalization for performance is a dangerous illusion.
Relations, Keys and Dependencies
Normalization and first normal form (1NF)
Further Normalization and Higher Normal Forms
- Single-class relations and functional dependencies
- Join Dependencies
- “The whole key” and second normal form (2NF)
- “Nothing but the key” and third normal form (3NF)
- “The whole key” and Boyce-Codd normal form (BCNF)
- Multivalued Dependencies and fourth normal form (4NF)
- Sixth normal form (6NF)
“Denormalization For Performance”
- Logical-Physical Confusion
- Redundancy Control
- Join dependency constraints and SQL
- The costly illusion
#3. THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA
V.4 (May 2015)
The relational data model (RDM) is based on the two-valued logic (2VL) of the real world: every proposition is unequivocally true or false. This foundation in logic enables relational DBMS's (RDBMS) to guarantee query results that are logically correct. What we record in databases, though, is not the real world, but our knowledge of it, which is usually imperfect—some data are missing, in which case and we don't know whether propositions are true or not. This violation of 2VL and RDM disables the guarantee. Missing data have arguably proven to be the thorniest aspect of database management.
The IT industry (a) relies on SQL's arbitrary and flawed implementations of "three and a half-valued logic" (3.5VL) based on NULL, that complicate the formulation of constraints and queries, the interpretation of query results and can produce incorrect results with respect to the real world, of which data professionals and users are commonly unaware of, or oblivious to.
This paper proposes a solution to missing data based on 2VL and, therefore, consistent with the RDM that
- Avoids the problems of many-valued logics (nVL where n > 2) and SQL's "3VL";
- Requires no changes to the RDM;
- Is largely transparent to users, while keeping them apprised of the effects of missing data.
Inapplicable Data: Nothing's Missing
Missing Data: Into the Unknown
SQL’s NULL: What-Valued Logic?
Known Unknowns: Metadata
A 2VL Relational Solution
The Practicality of Theory
- 2VL vs. NULL’s in the Real World
- Relation Proliferation
- Implementation Technology
- Comments on the Proposed Solution
#4. THE KEY TO KEYS: A MATTER OF IDENTITY
V.2 (May 2015)
Entities are distinguishable in the real world by some identifier, or we would not be able to tell them apart, or count them. For accurate database representation of reality, relations must have at least one candidate key (CK)--an attribute that represents the identifier. If there are multiple identifiers, one key is selected as the primary key (PK) for referential purposes.
But the types of key, their function, necessity, properties and PK selection criteria are not well known and understood--one of the most frequent hits on this site is driven by the search "Are keys necessary?"
This paper explains
- The key concept;
- Database functions, necessity and properties of keys;
- Criteria for PK selection;
- DBMS key support;
- Keys in SQL.
Relations and Integrity Constraints
Keys and Key Constraints
- Kinds of Keys
- Candidate and Primary Keys
- Natural Keys
- Simple and Composite Keys
- Foreign and Surrogate Keys
- Duplicate Prevention
- Guaranteed Logical Access
- Reduction of Integrity Burden
- View Updatability and Logical Data Independence
DBMS Key Support
- Keys in SQL
- Duplicate removal
#5. TRULY RELATIONAL: WHAT IT REALLY MEANS
V.3 (May 2015)
In the first paragraph of his 1969 initial exposition of the relational data model (RDM) for database management, E.F. Codd claimed two core advantages (among others) conferred on database practice by its dual theoretical foundation--predicate logic and set theory:
- Physical data independence (PDI);
- Guaranteed logically correct query results.
This paper revisits Codd’s original ideas in his seminal first two papers:
- Reasserts those aspects that have been ignored;
- Recalls those that were missed;
- Makes accessible those that are opaque;
- Corrects misinterpretations as well as mistakes.
Relations on Domains
Atomicity, Nested Relations, and Normalization
- Foreign Keys and (First) Normal Form
Operations on Relations
Kinds of Relations
Derivability, Redundancy, Consistency
#6. DOMAINS: THE DATABASE GLUE
V.2 (January 2015)
A relation is defined on domains (in mathematics it is a subset of a Cartesian product of domains). In other words, no domains, no database relations and no relational databases. E. F. Codd, the inventor of the relational data model (RDM), referred to them as the "glue that holds the database together": only the values of attributes defined on the same domain are meaningfully comparable--those attributes represent the same real world property (mean the same thing).
Yet domains are the least understood database feature. This is both a cause and a consequence of lack of domain support by both ANSI/ISO SQL and SQL DBMS's. The subversion of domain value atomicity that is possible in SQL violates the core relational Information Principle. The inability to define custom domains of arbitrary complexity is erroneously blamed on the RDM when the data model places no restrictions on that whatsoever--indeed, domains of arbitrary complexity are one of the RDM's major advantages--the culprit is the poor relational fidelity of SQL and its implementations.
This paper explains
- The domain concept;
- Distinctions from programming data type;
- Kinds of domain;
- SQL domain support;
- Implications for database practice.
Domains and Types
Meaning and Representation
Kinds of Domain
- Domains and SQL System-Defined Types
- “Universal” DBMS;
- Database Design;
- Object DBMS;
- Tackling Complexity.
Use of Materials Policy: Publisher hereby grants to authorized users non-exclusive, non-transferable, revocable rights to use the materials for personal, non-commercial purposes.
Authorized Users: Individuals for their own exclusive use; Faculty for the preparation of courses or other non-commercial materials for scientific, scholarly and educational works; Commercial institutions for internal educational purposes.
Authorized Uses: Exclusive and onsistent with the Fair Use Provisions of United States and international law, for non-commercial purposes.
Unauthorized Use: Authorized users shall not knowingly permit others to use the materials; Modify, or create a derivative work without prior written permission of publisher; Remove, obscure or modify any copyright or other notice; Use the materials for commercial purposes, including but not limited to sale, reproduction, or distribution in any form.
- Individuals $15.00
- Corporate $25.00
- Nonprofit $20.00
- Students $10.00
Contact us for volume licenses via the ABOUT page.
Updates and revisions: Revisions and new versions within the same year are free. New versions within a year get 50% discount.
- Select the paper(s) to order;
- Calculate the total order price;
- Specify the papers you order in the "Which paper(s)" text field below;
- Click on the Buy Now button;
- At the payment form, login to your Paypal account;
Address any questions or payment/delivery issues to the above address.
We appreciate your support, which keeps this site free.
Founder, Editor, Publisher and Debunker-in-Chief