THE FINAL NULL IN THE COFFIN*? A PROPOSED RELATIONAL SOLUTION TO MISSING DATA
Fabian Pascal Paper #4 v.2 (January 2011)

<

This paper should be considered investigative in character. Further research is required at both the logical and implementation levels, but we believe that the idea is sound and implementable.

This is a major revision of this paper that supersedes all previous versions. It relies on terminology and concepts developed in Paper #2 v.4 (January 2011) Business Modeling for Database Design: A Foundation Framework for Data Management that is strongly recommended as a preamble.


ABSTRACT

As attested to by the volume of writings and the heat of the debate on the subject (see references), the treatment of missing data has possibly been one of the thorniest aspects of database management. Users are left between a rock and a hard place: they can either rely on SQL' s problematic version of three-valued logic based on NULLs, and risk hard to interpret database answers and/or hard to detect errors in integrity enforcement and query results, or undertake the prohibitive burden of what is a complex database function that belongs in the DBMS.

This paper summarizes the drawbacks of the many-valued logic approach to missing data, and SQL’s problematic and poorly implemented flavor of three-valued logic via NULLs, and proposes a possible solution within the two-valued logic/relational framework. It (a) separates unknown and therefore missing data from “inapplicable” and therefore non-missing data), and provides proper design guidelines to avoid the latter (b) treats missing data correctly as metadata and (c) yields logically correct answers with respect to the real world, without the complications and problematics of many-valued logic and SQL’s NULLs.

It is also argued that the TransRelational™ Model of implementation, that facilitates the design of high-performance, fully data independent true RDBMSs, lends itself particularly well to the proposed missing data solution.

· Introduction

· The Logic of the Real World

· "Inapplicable Values": A Red Herring

· Into the Unknown: Three-Valued Logic

· Not of This World: SQL's NULL

· Assert Only What You Know

· Known Unknowns: Data About Data

· SQL vs. 2VL: A Real World Comparison

· "Too Many" R-Tables?

· The TransRelational(TM) Implementation Model

· Some Misconceptions Debunked

· Concluding Remarks

· References

Use of Materials Policy

PRICING AND ORDERING

Hit Counter by Digits
Counter by Digits