SEMINARS

PRACTICAL DATABASE FOUNDATIONS SERIES


(For on-site seminars contact us via the ABOUT page; Order papers in this series from the PUBLICATIONS page.)

The objectives of these seminars are:

  • To make the true data science--as distinct from what is hyped as such--accessible, without losing theoretical rigor;
  • To dispel common, entrenched and persistent misconceptions about data, database and relational fundamentals;
  • To discuss important aspects of data management that are systematically ignored, misunderstood, misused and abused;
  • To expose the practical implications of theory
for data professionals and users who think critically and independently, rather than operate in the IT industry's "cookbook mode".

(Seminars #1 and #5 are pre-requisites for all others.)
 

#1. BUSINESS MODELING FOR DATABASE DESIGN: FORMALIZING THE INFORMAL

V. 4 (May 2015)


Few data management aspects are as misunderstood as business modeling and database design. There is entrenched, persistent and broad confusion about levels of information representation and the type of model that corresponds to each:
  • Conceptual (business) model;
  • Logical (database) model;
  • Physical (implementation) model;
  • Data model.
and the differences and relationship among them.

This seminar is an introduction to the science underlying database design, the advantages of adhering it and the cost of disregarding or misusing it. It offers 

  • A rigorous and sound methodology for database practice; 
  • A foundation framework within which to evaluate claims, technologies, products and practices and avoid detect and correct problems.

Topics: 

Business Modeling

  • Basic modeling concepts
  • Business rules
  • Business models

Database Design

  • Formalizing the informal
  • Predicates and propositions

The relational data model

  • Data structure
  • Data integrity
  • Data manipulation
  • Logical models

Understanding Database Management

  • Constraint Formulation and Verification
  • Missing data
  • A foundation framework
  • Misconceptions debunked

 

#2. THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE

v. 4 (May 2015)

The methodology promoted in seminar #1 implicitly produces "fully" normalized (5NF) databases. But poor practices, including “denormalization for performance”, frequently result in poorly designed and costly databases.

Designs that do not adhere the Principle of Full Normalization (POFN) present drawbacks for integrity enforcement and data manipulation and raise the risk of inconsistent databases and incorrect or misinterpreted query results, but few data professionals know is that only fully normalized databases guarantee semantic correctness i.e., no database update anomalies. 

Explicit normalization (to 1NF) and further normalization (to 5NF) are necessary only to repair poorly designed databases and eliminate the drawbacks.

This seminar explains

  • The kinds of possible deviations from POFN;
  • The undesirable implications and cost thereof;
  • How to repair poor designs and eliminate the drawbacks;
  • Why denormalization for performance is a dangerous illusion.

Topics:
 

Relations, Keys and Dependencies

Normalization and first normal form (1NF)

Further Normalization and Higher Normal Forms

  • Single-class relations and functional dependencies
  • Join Dependencies
  • “The whole key” and second normal form (2NF)
  • “Nothing but the key” and third normal form (3NF)
  • “The whole key” and Boyce-Codd normal form (BCNF)
  • Multivalued Dependencies and fourth normal form (4NF)
  • Sixth normal form (6NF)

“Denormalization For Performance”

  • Logical-Physical Confusion
  • Redundancy Control
  • Join dependency constraints and SQL
  • The costly illusion

 #3. THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA

V.4 (May 2015)

The relational data model (RDM) is based on the two-valued logic (2VL) of the real world: every proposition is unequivocally true or false. This foundation in logic enables relational DBMS's (RDBMS) to guarantee query results that are logically correct. What we record in databases, though, is not the real world, but our knowledge of it, which is usually imperfect—some data are missing, in which case and we don't know whether propositions are true or not. This violation of 2VL and RDM disables the guarantee. Missing data have arguably proven to be the thorniest aspect of database management.

The IT industry (a) relies on SQL's arbitrary and flawed implementations of "three and a half-valued logic" (3.5VL) based on NULL, that complicate the formulation of constraints and queries, the interpretation of query results and can produce incorrect results with respect to the real world, of which data professionals and users are commonly unaware of, or oblivious to.

This seminar proposes a solution to missing data based on 2VL and, therefore, consistent with the RDM that

  • Avoids the problems of many-valued logics (nVL where n > 2) and SQL's "3VL";
  • Requires no changes to the RDM;
  • Is largely transparent to users, while keeping them apprised of the effects of missing data.
Note: The proposed solution requires research into all its implications for data manipulation and integrity enforcement before it is implemented, but because it was arrived independently at by both myself and David McGoveran, we believe it is theoretically sound and implementable in a true RDBMS using technologies that, unlike SQL, support full physical data independence, such as the TransRelational™ implementation model (TRM) http://bookboon.com/en/go-faster-ebook.
 

Topics:


Inapplicable Data: Nothing's Missing

Missing Data: Into the Unknown

SQL’s NULL: What-Valued Logic?

Known Unknowns: Metadata

A 2VL Relational Solution

The Practicality of Theory

  • 2VL vs. NULL’s in the Real World
  • Relation Proliferation
  • Implementation Technology
  • Comments on the Proposed Solution

#4. THE KEY TO KEYS: A MATTER OF IDENTITY

V.2 (May 2015)

Entities are distinguishable in the real world by some identifier, or we would not be able to tell them apart, or count them. For accurate database representation of reality, relations must have at least one candidate key (CK)--an attribute that represents the identifier. If there are multiple identifiers, one key is selected as the primary key (PK) for referential purposes.

But the types of key, their function, necessity, properties and PK selection criteria are not well known and understood--one of the most frequent hits on this site is driven by the search "Are keys necessary?"

This seminar explains

  • The key concept;
  • Database functions, necessity and properties of keys;
  • Criteria for PK selection;
  • DBMS key support;
  • Keys in SQL.
It also debunks some common misconceptions about keys.

Topics:

Relations and Integrity Constraints

Keys and Key Constraints

  • Kinds of Keys
  • Candidate and Primary Keys
  • Natural Keys
  • Simple and Composite Keys
  • Foreign and Surrogate Keys

Key Functions

  • Duplicate Prevention
  • Guaranteed Logical Access
  • Reduction of Integrity Burden
  • View Updatability and Logical Data Independence

DBMS Key Support

  • Keys in SQL
  • Duplicate removal

#5. TRULY RELATIONAL: WHAT IT REALLY MEANS

V.3 (May 2015)

In the first paragraph of his 1969 initial exposition of the relational data model (RDM) for database management, E.F. Codd claimed
two core advantages (among others) conferred on database practice by its dual theoretical foundation--predicate logic and set theory:
  • Physical data independence (PDI);
  • Guaranteed logically correct query results.
Yet almost five decades later the IT industry is still mostly ignorant of his ideas and fails to understand and appreciate their impact. The closest vendors have come to implement the RDM is their first--and only--attempt: SQL. SQL DBMS's ignore relational principles and/or violate them in both letter and spirit in so many ways, that they cannot be considered relational. The practices of data professionals are not much better. There is continuous regression to technologies and practices that the RDM made obsolete decades ago.

This seminar revisits Codd’s original ideas in his seminal first two papers:

  • Reasserts those aspects that have been ignored;
  • Recalls those that were missed;
  • Makes accessible those that are opaque;
  • Corrects misinterpretations as well as mistakes.

Topics:


Relations on Domains

Relation Representation

Time-Varying Relations

Relation Interpretation

Data Sublanguage

Atomicity, Nested Relations, and Normalization

  • Foreign Keys and (First) Normal Form

Operations on Relations

Kinds of Relations

Derivability, Redundancy, Consistency



#6. DOMAINS: THE DATABASE GLUE

V.2 (January 2015)

A relation is defined on domains (in mathematics it is a subset of a Cartesian product of domains). In other words, no domains, no database relations and no relational databases. E. F. Codd, the inventor of the relational data model (RDM), referred to them as the "glue that holds the database together": only the values of attributes defined on the same domain are meaningfully comparable--those attributes represent the same real world property (mean the same thing).

Yet domains are the least understood database feature. This is both a cause and a consequence of lack of domain support by both ANSI/ISO SQL and SQL DBMS's. The subversion
of domain value atomicity that is possible in SQL violates the core relational Information Principle. The inability  to define custom domains of arbitrary complexity is erroneously blamed on the RDM when the data model places no restrictions on that  whatsoever--indeed, domains of arbitrary complexity are one of the RDM's major advantages--the culprit is the poor relational fidelity of SQL and its implementations.

This seminar explains

  • The domain concept;
  • Distinctions from programming data type;
  • Kinds of domain;
  • SQL domain support;
  • Implications for database practice.

Topics:

Domains and Types

Meaning and Representation

Kinds of Domain

  • Simple
  • Non-simple
  • Custom
  • Domains and SQL System-Defined Types

Practical Implications

  • “Universal” DBMS;
  • Database Design;
  • Object DBMS;
  • NoSQL;
  • Tackling Complexity.
(For on-site seminars contact us via the ABOUT page; Order papers in this series from the PUBLICATIONS page.)



1 comment:

  1. Great writing! Thanks for shining light into the dark corners. Particularly, I like your discussion of NULL and "Why 'denormalization for performance' is a costly illusion". Thanks.

    ReplyDelete