Many practitioners are inducted into data management without sufficient foundation knowledge. Among the various aspects of data management that suffer from it, few are as misunderstood as business modeling and database design, particularly the critical distinction and relationships between three types of model that are commonly confused:
  • Conceptual (business) model
  • Logical database model
  • Data model (via which the former is mapped to the latter)
This seminar introduces practitioners to data fundamentals via the process of business modeling and database design from a dual practical perspective ignored in the industry:
  • Generality-to-simplicity ratio: maximal applicability with with minimal complexity
  • Soundness: formal foundation
What you will learn:
  • A systematic methodology for business modeling and database design
  • A foundation framework for database management
  • How to use it to assess technologies, products and practices, evaluate claims and detect and correct problems

 1. Introduction

 2. Business Modeling
 2.1 Basic Modeling Concepts
 2.2 Business Rules
 2.2.1 Property Rules
 2.2.2 Class Rules Attribute Rules Identity Rules Functional Dependency Rules Referential Rules Enterprise-specific Rules
 2.3 Business Models

 3. Database Design
 3.1 Predicates and Propositions
 3.2 The Relational Data Model
 3.2.1 Data Structure
 3·2.2 Manipulation
 3.2.3 Integrity
 3.3 Logical Models

 4. Understanding Database Management
 4.1 Truth vs. Consistency
 4.2 Database and DBMS Defined
 4.3 A Foundation Framework
 4.4 Applying the Framework - Examples


Correct business modeling for database design implicitly produces fully normalized (5NF) databases. Further normalization beyond 1NF is necessary only as a repair procedure, when design has violated the principle either due to inadvertent errors, or the intentional “denormalization for performance” illusion.

Undernormalization imposes significant costs that are commonly ignored. Data redundancy is only one of them, albeit a major one. It increases the risk of inconsistency and the integrity burden on both application developers and the DBMS. What is more, if undertaken, not only will the burden defeat any performance gains, but may actually  degrade performance.

This seminar introduces to database design repair via further normalization. It explains in easy to understand language:
  • Levels of normalization (1NF-4NF)
  • Their practical consequences
  • How to repair designs to eliminate the drawbacks
  • Why "denormalization for performance" is a costly illusion.


1. Introduction

 2. R-tables, Keys and Dependencies

 3. Normalization and Normal Forms

 4. Further Normalization As Design Repair
 4.1 Join Dependencies
 4.2 “The Whole Key” and 2NF
 4.3  “Nothing But the Key” and 3NF
 4.4  More “The Whole Key” and BCNF
 4.5  Multivalued Dependencies and 4NF
 4.6  Interval Data and 6NF

 5. "Denormalization for performance"
 5.1  The Logical-Physical Confusion
 5.2  Redundancy Control
 5.3  A Note on JD’s and SQL
 5.4 The Real Problem and Solution

 6. Some Fallacies Dispelled

The relational data model is based on the two-valued logic (2VL): every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether propositions are true or not. If 2VL no longer applies, database consistency and query results are no longer guaranteed to be provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways difficult to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This seminar
  • Reviews drawbacks of the many-valued logic (nVL, n > 2) approach to missing data
  • SQL’s NULL ~3VL scheme and its practical implications
  • Proposes a 2VL relational solution that:
  • Guarantees consistency and provably logically correct query results
  • Avoids the complications and problematics of nVL/NULL's
  • Requires no changes to the relational model
  • Is largely transparent to users
  • Keeps users better apprised of the existence and effects of missing data

 1. Introduction

 2. "Inapplicable Data”: Nothing's Missing

 3. Missing Data: Into the Unknown

 4. NULL: SQL's ~3VL
 5. Known Unknowns: Metadata

 6. A 2VL Relational Solution

 7. The Practicality of Theory
 8. 2VL vs. NULL in the Real World

 9. Relation Proliferation
 9.1 The TransRelational™ Implementation Model

10  Some Misconceptions Dispelled


If entities in the real world did not have identifiers—attributes that capture their identity and uniquely identify them—we would not be able to tell them apart. It follows that an accurate database representation of business reality must include keys, R-table columns that formally represent the real world identifiers in the database.

This seminar explains in easy to understand language what keys are, the several types of keys, their necessity, selection criteria, functions and properties.

You will learn:
  • The key concept, its meaning and necessity;
  • Functions and properties of the various types of key;
  • The criteria for key selection;
  • What is proper DBMS key support;
  • Key support in SQL;


 1. Introduction

 2. R-tables and Integrity Constraints

 3. Keys and Key Constraints

 4. Kinds of Keys
 4.1 Natural Keys
 4.1.1 Candidate and Primary Keys
 4.1.2 Simple and Composite Keys
 4.2 Surrogate Keys

 5. Key Functions
 5.1 Duplicate prevention
 5.2 Integrity Burden Reduction
 5.3 View Updatability

 6. Foreign Keys and Referential Constraints

 7. DBMS Key Support

 8. Keys in SQL
 8.1 SQL and Duplicates
 8.1.1 Duplicate removal in SQL
 8.1.2. Duplicates and Language Redundancy

 9. Some Misconceptions Dispelled

In the first paragraph of his first ever published exposition of the relational idea E. F. Codd claimed 42 years ago three practical advantages of a relational model of data:
  • A sound theoretical basis—logic and mathematics—for database management
  • Physical data independence
  • DBMS-guaranteed integrity of data and logically provably correct query results
with the objective to relieve database practice from the complexities and rigidity of the hierarchic and network DBMS products at the time.

Yet from 1969 to date the industry has failed to concretize Codd’s full ideas. SQL implementations have limited relational fidelity--they lack or violate many relational features. Moreover, instead of improvement, there is regression to the same costly and unproductive technologies made obsolete by Codd’s innovation more than three decades ago.

Much of this is due to educational failure: vendors, experts, users and the trade press do not know, understand and appreciate the practical value of Codd’s contribution and the huge cost of ignoring it and academia has become a training ground for vendors.

The series that this seminar initiates makes data fundamentals accessible to practitioners in a language they understand. This first seminar:
  • Revisits Codd’s original work
  • Reasserts those aspects that have been ignored
  • Recalls those that were missed
  • Clarifies those that are opaque
  • Corrects misinterpretations as well as original mistakes
  • Settles current disagreements and confusion over what the relational model really is


 1. Introduction

 2. Relations on Domains

 3. Relation Representation

 4. Time-varying Relations

 5. Relation Interpretation

 6. Data Sublanguage

 7. Atomicity, Nested Relations and Normalization

 8. Foreign Keys and (First) Normal Form

 9. Operations on Relations

10. Kinds of Relations

11. Derivability, Redundancy, Consistency

12. Misconceptions Dispelled

1 comment:

  1. Great writing! Thanks for shining light into the dark corners. Particularly, I like your discussion of NULL and "Why 'denormalization for performance' is a costly illusion". Thanks.