Wednesday, July 13, 2022

MISSING DATA AND MULTI-RELATION QUERY RESULTS (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

On NULLs and Multi-Table Relvars

(first published 04/05/02)

"I had a question about the missing-values suggestion in PRACTICAL ISSUES IN DATABASE MANAGEMENT, page 234. You write:
"Table operations would have to be modified to yield results with as many tables as there are types of propositions with only known values."
How would this be represented in a language like Tutorial D, where relvars are required to be strongly typed? One possible idea is to make use of type inheritance. Suppose I had a domain of tuple values {x,a,b,c} (all integers, say) where x is not allowed to be missing but a, b, and c are allowed to be missing. Suppose we extended the domains of a, b, and c with an "imaginary" special value that we will never represent, which I will show for diagram purposes only as '?'. Then the domain can be split into parts:
XABC {x,a,b,c} possrep: {X: int, A: int, B: int, C: int}
XAB {x,a,b,'?'} possrep: {X: int, A: int, B: int}
XAC {x,a,'?',c} possrep: {X: int, A: int, C: int}
XBC {x,'?',b,c} possrep: {X: int, B: int, C: int}
XA {x,a,'?','?'} possrep: {X: int, A: int}
XB {x,'?',b,'?'} possrep: {X: int, B: int}
XC {x,'?','?',c} possrep: {X: int, C: int}
X {x,’?','?','?'} possrep: {X: int}
Using Mr. Date's specialization by constraint idea, we can inherit all the subtuple types from the main tuple type. Updates could make a tuple change type. A relation of relations of XABC type could be used to return results of a query. Each relation within the relation would contain one subtype.

However, the exponential explosion of possible subtypes would be very difficult to handle, practically speaking. As you admit in your book, a real DBMS might have to handle thousands of small subtables. This cannot be passed off as an "implementation detail" since table operations "yield results" at the user presentation level. No matter how efficient the underlying system might be, this seems unacceptable. Perhaps we have to fall back on default values after all."

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

07/03 Relations, Database Relations and Tables (SMS)

06/26 Repeating Groups and 1NF (T&N)

05/21 SMS: Order and Relational Databases

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
------------------------------------------------------------------------------------------------------------------

Then

Tutorial D does not explicitly incorporate the concept of missing data as metadata expounded in my PRACTICAL ISSUES IN DATABASE MANAGEMENT, which originates with David McGoveran. THE THIRD MANIFESTO refers in an appendix to Chris Date's default value scheme included in his RELATIONAL DATABASE WRITINGS 1991-1994. He proposed it only as a better solution than NULLs, but it does not address the fundamental meta-data nature of missing data that I raise in my book. He subscribes to the meta-data approach as the theoretically correct solution and we both agree that the lack of elegant, simple solution is inherent in the nature of missing data, because it is outside the scope of the two-valued logic of the real world. Incidentally, the term "NULL value" is a contradiction in terms: NULLs are not values [and thus violate Codd’s Information Principle].

Now

Note new title: Relations, -- neither tables, nor relvars!

"Relvars introduce a concept of assignment, which has no counterpart in either FOPL or set theory. If you add it to those formalisms you introduce computational completeness, which destroys both decidability (the existence of a general algorithm by which you can determine if an expression is or is not logically valid) and the guarantee that there exists a (query) evaluation procedure that will halt (the existence of a general algorithm by which you can evaluate the truth or falsity of every instantiated predicate expression given those instantiations from any given database). Therefore we must forbid relvars."

--David McGoveran

The fact that data is unknown and therefore missing is not data, but data about data -- meta-data -- that belongs in the database catalog and should be managed by the DBMS. In "The Last NULL in the Coffin" paper we propose a relational solution (i.e., within 2VL and without NULLs) that requires, indeed, a relational algebra (RA) with multi-relation results. We know it is the correct solution because it is consistent with the McGoveran interpretation -- the current understanding -- of the RDM, that mandates both base and derived relations in 5NF (i.e., RA with 5NF closure) and, thus, multi-relation results (why?).

The paper explains why proliferation of relations is not a concern with a true RDBMS implementing this RDM interpretation.


Further Reading

The Final NULL in the Coffin: A Relational Solution to Missing Data

Missing Data series 1

Missing Data series 2

Read My Lips If There's NULLs, It's Not Relational

NULL Value is a Contradiction in Terms

Relation Proliferation

Nobody Understands the Relational Model series

Relations, Database Relations and Tables



No comments:

Post a Comment

View My Stats