Tuesday, July 30, 2013

The Final NULL in the Coffin: A Relational Solution to Missing Data

Order via the PAPERS page


v.3 (August 2013)

The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether propositions are true or not; 2VL no longer applies and data integrity and database query results are no longer guaranteed to be enforceable and provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
  • Guarantees data integrity and logically correct query results;
  • Avoids the complications and problematics of nVL/NULL's;
  • Requires no changes to the relational model;
  • Is largely transparent to users;
  • Keeps users better apprised of the existence and effects of missing data.
The proposed solution requires research into its implications for data manipulation and integrity enforcements before it is implemented, but we believe it is theoretically sound and implementable in a truly relational DBMS (TRDBMS) using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ Model (TRM).

Table of Contents
  • Introduction
  • "Inapplicable Data”: Nothing's Missing
  • Missing Data: Into the Unknown
  • SQL’s 3VL: NULL
  • Known Unknowns: Metadata
  • A 2VL Relational Solution
  • The Practicality of Theory
  • 2VL vs. NULL in the Real World
  • Relation Proliferation
  • The TransRelational™ Model
  • Conclusion
  • Some Misconceptions Debunked
  • References

Sunday, July 28, 2013

Site Update

Some housekeeping. The posting to the blog and multiple static pages is a bit of a hassle. I am also facing some work on my seminars and papers. Until further notice:
  • There will be one post/week--alternating articles and Site Updates (I may skip the latter on certain weeks, if absolutely necessary);
  • Quotes and links to LAUGH/CRY? and FP ONLINE will be posted directly into Site Update posts (like below); the respective static Pages will be updated at the end of each month.
Some tool that would automate posts and updates in one shot would have helped. I looked into it, but for various reasons (including Google's Blogger updates), nothing is available (if you know of any, preferably from experienc, please recommend).

Quote of the Week:
...the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes" ... Consider if we had a bunch of tables, each containing the thing A. Now what is the population of A? It cannot be found in any one of the tables. It is actually the union of all the populations of A plus more if we allow A to exist (i.e., be of interest to us) but does not appear in any of the tables. That would be the case of a master reference list of "codes" for which we would then build a separate table. But even that is insufficient. We would also have to define and enforce referential integrity everywhere an A appeared. All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object.
To Laugh or Cry?
What’s the Best Way for Structured Data Computing in Java?
FP Online:
Let's innovate....database
Good advice:
Designing a Database: 7 Things You don't Want To Do
But why it bothers me?

And now for something completely different.
NSA claims inability to search agency's own emails
Clueless doctor sleeps through math class, reinvents calculus…and names it after herself. At least the doctor re-invented something in a different field. Data professionals do it all the time in their own field.
You can't make these things up.

Monday, July 15, 2013

Site Update (UPDATED)

07/19/13: I have also added my latest post at All Analytics to the FP ONLINE page.

07/18/13: This update referred to items that were erroneously dated 7/3/13 instead of 7/15/13. This has now been corrected. 

The 'Quote of the Week' was posted   to the QUOTES page.

A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Everything should be as simple as possible, but not simpler.
--Albert Einstein

A link to an exchange I participated was posted on the FP ONLINE page.

And now for something completely different.

If You Search, Advertise on, Invest in, or Have Kids Who Use Google, You Must See

Too much power is always dangerous, no matter who holds it.

Monday, July 8, 2013

Relational Theory and Database Practice

I shared the links to my recent three-part series on foreign keys (and integrity constraints in general) on LinkedIn. Comments on the second installment raised an important issue about keys (discussed in more depth in Business Modeling for Database Design), which deserves attention.
NK: Let me first affirm my position that I believe foreign keys are the fundamental bases on which relational database managements system operate. Foreign keys provide the relationship in database normalization. Foreign keys are like the framework of a building structure. While some developers may have the notion that constraints and integrity checks can be handled better at the application layer, I would want to refer them to tools like ER Studio, ERWIN, and Visual Studio ... A good database design starts at the logical design level. Abstracting constraints and integrity checks from this layer to the application layer can lead to corrupt database designs. A simple case in point; How would you enforce a unique constraint on a table with 10 million rows? Will it make better sense to have a unique index on the table\field or have the application layer enforce the constraint?
View My Stats