PAPERS

PRACTICAL DATABASE FOUNDATIONS Series

(Papers #1 and #2 are pre-requisites for all others.)

The objectives of these papers are:

  • To make true data science--as distinct from what is hyped as such--accessible, without losing theoretical rigor;
  • To dispel common, entrenched fallacies about data and relational fundamentals;
  • To clarify important aspects of data management that are systematically ignored, misunderstood, misused and abused;
  • To expose the practical implications of theory.
  • for the data professional and user who thinks critically and independently, rather than operates in the IT industry's "cookbook mode".

#1. BUSINESS MODELING FOR DATABASE DESIGN: FORMALIZING THE INFORMAL V.4 (May 2015)

Few data management aspects are as misunderstood as business modeling and database design. There is entrenched, persistent and broad confusion about levels of information representation and the type of model that corresponds to each:
  • Conceptual (business) model;
  • Logical (database) model;
  • Physical (implementation) model;
  • Data model.
This paper introduces a rigorousconceptual business modeling methodology that lends itself to formal logical database representation and the practical advantages of the Relational Data Model (RDM).

Topics:

Business Modeling

  • Basic modeling concepts
  • Business rules
  • Business models
Database Design
  • Formalizing the informal
  • Predicates and propositions
The Relational Data Model
  • Data structure
  • Data integrity
  • Data manipulation
  • Logical models
Understanding Database Management
  • Constraint Formulation and Verification
  • Missing data
  • A foundation framework
Misconceptions debunked

#2. THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE v.4 (May 2015)

The methodology described in paper #1 implicitly produces "fully" normalized (5NF) databases. But poor practices, including “denormalization for performance”, frequently result in poorly designed and costly databases. Designs that do not adhere the Principle of Full Normalization (POFN) present drawbacks for integrity enforcement and data manipulation and raise the risk of inconsistent databases and incorrect or misinterpreted query results, but few data professionals know is that only fully normalized databases guarantee semantic correctness i.e., no database update anomalies. Explicit normalization (to 1NF) and further normalization (to 5NF) are necessary only to repair poorly designed databases and eliminate the drawbacks.

Topics:

Relations, Keys and Dependencies
Normalization and first normal form (1NF)

  • Further Normalization and Higher Normal Forms
  • Single-class relations and functional dependencies
Join Dependencies
  • “The whole key” and second normal form (2NF)
  • “Nothing but the key” and third normal form (3NF)
  • “The whole key” and Boyce-Codd normal form (BCNF)
  • Multivalued Dependencies and fourth normal form (4NF)
  • Sixth normal form (6NF)
“Denormalization For Performance”
  • Logical-Physical Confusion
  • Redundancy Control
  • Join dependency constraints and SQL
  • The costly illusion

#3. THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA V.4 (May 2015)

Missing data have arguably proven to be the thorniest aspect of database management.

The RDM is based on the two-valued logic (2VL) of the real world: every proposition is unequivocally true or false. This enables relational RDBMSs to guarantee logical correctness of query results. What we record in databases, though, is not the real world, but our knowledge of it, which is usually imperfect--some data are missing--in which case and we don't know whether propositions are true or not. This violation of 2VL and RDM defeats the guarantee.

The IT industry (a) relies on SQL's arbitrary and flawed implementations of "three and a half-valued logic" based on NULL, that complicate the formulation of constraints and queries, the interpretation of query results and can produce incorrect results with respect to the real world, of which data professionals and users are commonly unaware of, or oblivious to.

This paper proposes a solution to missing data based on 2VL and, therefore, consistent with the RDM that

  • Avoids the problems of many-valued logics (nVL where n > 2) and SQL's "3VL";Requires no changes to the RDM;
  • Is largely transparent to users, while keeping them apprised of the effects of missing data.
Note: The proposed solution requires research into all its implications for data manipulation and integrity enforcement before it is implemented, but because it was arrived independently at by both myself and David McGoveran, we believe it is theoretically sound and implementable in a true RDBMS using technologies that, unlike SQL, support full physical data independence (PI).

Topics:

Inapplicable Data: Nothing's Missing
Missing Data: Into the Unknown
SQL’s NULL: What-Valued Logic?
Known Unknowns: Metadata
A 2VL Relational Solution
The Practicality of Theory
2VL vs. NULL’s in the Real World
Relation Proliferation
Implementation Technology
Comments on the Proposed Solution

#4. THE KEY TO KEYS: A MATTER OF IDENTITY V.2 (May 2015)

Entities are distinguishable in the real world by some identifier, or we would not be able to tell them apart, or count them. For accurate database representation of reality, relations must have at least one candidate key (CK)--an attribute that represents the identifier. If there are multiple identifiers, one key is selected as the primary key (PK) for referential purposes.

But the types of key, their function, necessity, properties and PK selection criteria are not well known and understood--one of the most frequent hits on this site is driven by the search "Are keys necessary?"

This paper explains

  • The key concept;
  • Database functions, necessity and properties of keys;
  • Criteria for PK selection;
  • DBMS key support;
  • Keys in SQL.

Topics:

Relations and Integrity Constraints
Keys and Key Constraints
Kinds of Keys

  • Candidate and Primary Keys
  • Natural Keys
  • Simple and Composite Keys
  • Foreign and Surrogate Keys
Key Functions
  • Duplicate Prevention
  • Guaranteed Logical Access
  • Reduction of Integrity Burden
  • View Updatability and Logical Independence (LI)
DBMS Key Support
Keys in SQL
Duplicate removal

#5. TRULY RELATIONAL: WHAT IT REALLY MEANS V.3 (May 2015)

In the first paragraph of his 1969 initial exposition of the relational data model (RDM) for database management, E.F. Codd claimed two core advantages (among others) conferred on database practice by its dual theoretical foundation--predicate logic and set theory:
  • Physical data independence (PDI);
  • Guaranteed logically correct query results.
Yet almost five decades later the IT industry is still mostly ignorant of his ideas and fails to understand and appreciate their impact. The closest vendors have come to implement the RDM is their first--and only--attempt: SQL. SQL DBMS's ignore relational principles and/or violate them in both letter and spirit in so many ways, that they cannot be considered relational. The practices of data professionals are not much better. There is continuous regression to technologies and practices that the RDM made obsolete decades ago.

This paper revisits Codd’s original ideas in his seminal first two papers:

  • Reasserts those aspects that have been ignored;
  • Recalls those that were missed;
  • Makes accessible those that are opaque;
  • Corrects misinterpretations as well as mistakes.
Topics:

Relations on Domains
Relation Representation
Time-Varying Relations
Relation Interpretation
Data Sublanguage
Atomicity, Nested Relations, and Normalization
Foreign Keys and (First) Normal Form
Operations on Relations
Kinds of Relations
Derivability, Redundancy, Consistency

#6. DOMAINS: THE DATABASE GLUE V.2 (January 2015)

A relation is defined on domains (in mathematics it is a subset of a Cartesian product of domains). In other words, no domains, no database relations and no relational databases. E. F. Codd, the inventor of the relational data model (RDM), referred to them as the "glue that holds the database together": only the values of attributes defined on the same domain are meaningfully comparable--those attributes represent the same real world property (mean the same thing).

Yet domains are the least understood database feature. This is both a cause and a consequence of lack of domain support by both ANSI/ISO SQL and SQL DBMS's. The subversion of domain value atomicity that is possible in SQL violates the core relational Information Principle. The inability  to define custom domains of arbitrary complexity is erroneously blamed on the RDM when the data model places no restrictions on that  whatsoever--indeed, domains of arbitrary complexity are one of the RDM's major advantages--the culprit is the poor relational fidelity of SQL and its implementations.

This paper explains

  • The domain concept;
  • Distinctions from programming data type;
  • Kinds of domain;
  • SQL domain support;
  • Implications for database practice.

Topics:

Domains and Types
Meaning and Representation
Kinds of Domain

  • Simple
  • Non-simple
  • Custom
Domains and SQL System-Defined Types
Practical Implications

  • “Universal” DBMS;
  • Database Design;
  • Object DBMS;
  • NoSQL;
  • Tackling Complexity.

Pricing

1 paper         $ 25.00
2 papers        $ 45.00 
3 papers        $ 65.00
4 papers        $ 85.00
5 papers        $105.00
6 papers        $125.00

Contact us for volume discounts.


Updates and revisions

  • Same year revisions and new versions are free.
  • Next year versions are free.
  • Next year new versions are half price.

How to order

Check or money order: Preferred. Contact us for mailing address.

PayPal

  • Select the number of papers;
  • Click on the Buy Now button;
  • Login to your Paypal account and pay the amount; 
  • Enter the #s of the papers you ordered;
Book in PDF format will be emailed upon receipt notification by PayPal.


How Many

We appreciate your support, which keeps this site free. Thank you.

Fabian Pascal
Founder, Editor, Publisher and Debunker-in-Chief



No comments:

Post a Comment