Sunday, November 22, 2020

Oldies But Goodies: Missing Data - "Horizontal Decomposition" Part 1



Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.
 

“I'm excited to share a data.world research partnership with Prof Leonid Libkin and Paolo Guagliardo from The University of Edinburgh. Our goal is to understand how NULL values are used in the real word to bridge theory and practice. Please help us by participating in a survey.”


Thus a recent announcement on LinkedIn, which triggered reactions in praise of this "much needed effort".

Sigh! SQL's NULL is a blunder unworthy of research. The commonly used "NULL value" is a contradiction in terms, indicating that industry surveys are not a path to enlightening. The real issue is, of course, missing data, which is governed by long studied and well understood logic[1,2,3,4], though apparently not in the industry and today's academia.

In 2004 we published The Final NULL in the Coffin: A Relational Solution to Missing Data (a paper revised since) that we believe is theoretically sound and, importantly, consistent with McGoveran's work re-interpreting, extending and formalizing Codd's RDM[5]. At the time it generated a series of exchanges with readers, which were posted at the old DBDebunk (2000-2006). In light of the above they warrant re-production.

I start with the first, split in three parts: In this Part 1 a reader's reaction to both our solution and Hugh Darwen's "horizontal decomposition" alternative, How to Handle Missing Information without Using NULLs; Hugh's reply is in Part 2 and mine -- re-written to bring up to date with current state of knowledge and for clarity --
is in Part 3.

Note: As far as we know, Darwen no longer abides by that approach -- in a later paper he referred to a "multi-relation" which seems an allusion to our solution -- but the exchange is useful for pedagogical reasons.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-
07/22/20: LINKS update: Added “An Argument for Controlled Natural Languages in Mathematics”, “Let’s Make Set Theory Great Again”.
- 07/21/20 LINKS update: Added “How Gödel’s Proof Works”.

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
- @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

 

 On Darwen's “Handling of Missing Information Without Using NULLs”

(Originally posted on February 4, 2005)
 

From a DBDebunk reader:

Early this year, a friend and I were a bit confused by your [Darwen's] paper How to Handle Missing Information without Using NULLs (HTHMIWUN). We spent a lot of time analyzing it. I intended to ask you some questions, but as I was formulating [them], I kept finding more and more, until I had too many to write a simple e-mail. So, I began writing a website to cover the different questions that came to mind. In the end, I found that I could not produce anything that was both comprehensive and cohesive, so I abandoned the idea. I realized that the root of the problem was that I was working from a paper too small to fully explain your idea, and I decided that perhaps I should wait until you or your colleagues wrote more about your solution to precluding NULLs. 

Since you have asked, I shall do my best to briefly mention my most serious points. My main issue is with horizontal decomposition, which seems to be the heart of HTHMIWUN. I believe that horizontal decomposition is not a logical process, but rather a semantic process, which leads to two problems: 

1) A DBMS does not understand semantics. Handling multiple base relations with predicates of the same semantic field, predicates that may even be combined at the higher level of virtual relations, seems very difficult at best. I hope that someday, computers will be able to handle semantics as well as humans. In the meantime, it seems to me that your approach would require the addition of some meta-logical data to work well as a general solution. (Perhaps "semantically tagging" relations or using higher-order logic to group relations of the same semantic field could help.) 

2) Logical processes have rules to help arrive at proper solutions (normalization, for example); semantic processes, as far as I know, cannot have such rules, due to the rather arbitrary (or unknown) nature of semantics. So, how can a database designer know when and how to use horizontal decomposition effectively?  Even more strictly, can there be any formal rules for such a process? 

Other problems come to mind, such as the difficulty of virtual relations (especially on updates), complexity of distributed key constraints, and potential difficulties for database users, but they all stem from the problems of horizontal decomposition. All things considered, I'd rather not use your method to preclude NULLs. 

I am rather pleased with Pascal's approach to precluding NULLs, even if it does have some obstacles in the area of implementation. If you continue to believe in your method, I hope that you will write more about it.  Perhaps my misgivings are due mainly to the brevity of your paper. 

As silly as this may sound, I see horizontal decomposition as a kind of frontier for relational database modeling. The relational topic that interests me most is updating virtual relations [views], and I feel that horizontal decomposition may play a big role in that. So, what I'm getting at is even if you abandon your approach to NULLs, I hope that you will continue exploration of horizontal decomposition.

(Continued in Part 2)

 

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

 

References

[1] McGoveran, D., Nothing from Nothing: What's Logic Got To Do With It

[2] McGoveran, D., Nothing from Nothing: Classical Logic - Nothing Compares 2 U

[3] McGoveran, D., Nothing from Nothing: Can't Lose What You Never Had

[4] McGoveran, D., Nothing from Nothing: It's In The Way That You Use It

[5] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), in progress.

 

 

 

 

No comments:

Post a Comment

View My Stats