Tuesday, July 30, 2013

The Final NULL in the Coffin: A Relational Solution to Missing Data

Order via the PAPERS page


v.3 (August 2013)

The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether propositions are true or not; 2VL no longer applies and data integrity and database query results are no longer guaranteed to be enforceable and provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
  • Guarantees data integrity and logically correct query results;
  • Avoids the complications and problematics of nVL/NULL's;
  • Requires no changes to the relational model;
  • Is largely transparent to users;
  • Keeps users better apprised of the existence and effects of missing data.
The proposed solution requires research into its implications for data manipulation and integrity enforcements before it is implemented, but we believe it is theoretically sound and implementable in a truly relational DBMS (TRDBMS) using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ Model (TRM).

Table of Contents
  • Introduction
  • "Inapplicable Data”: Nothing's Missing
  • Missing Data: Into the Unknown
  • SQL’s 3VL: NULL
  • Known Unknowns: Metadata
  • A 2VL Relational Solution
  • The Practicality of Theory
  • 2VL vs. NULL in the Real World
  • Relation Proliferation
  • The TransRelational™ Model
  • Conclusion
  • Some Misconceptions Debunked
  • References

No comments:

Post a Comment