Saturday, December 11, 2021

Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 4



with David McGoveran

(Title inspired by Richard Feynman)

In Parts 1 and Part 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the RA, as currently defined for 1NF closure, produces what the industry calls "update anomalies" and, thus, is not a proper algebra. In Part 3 we used that information to debunk some leftover misunderstandings in the discussion.

We conclude in Part 4 with comments on a private exchange that followed the public one on LinkedIn regarding the difference between the McGoveran (DMG) and Date and Darwen's (TTM)
interpretations of the RDM, which can be summarized as follows:

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 12/05 TYFK: How Not to Explain the Relational Model

- 11/25 Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 3

 - 11/19 OBG: The Fate of Fads -- XML DBMS

- 11/11 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

- 11/05 OBG: Database Consistency and Physical Truth

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

TTM:

  • Mathematical relations (1NF, not necessarily 5NF); and
  • RA 1NF closure; and,
  • "Update anomalies"; and,
  • Single relation results.

DMG:

  • Database relations (1NF+5NF); and,
  • RA 5NF closure; and,
  • Guaranteed correctness; and;
  • Multi-relation results.

These are critical differences, but there are others (to be specified in DMG's forthcoming book), some of which stem from these.

Note: McGoveran alerts tha "anomalies" in some algebra is bad terminology, because it induces confusion of errors by representation of the operands with failures of preservation of the required properties of those operands. For example, the decimal fraction representation of real numbers introduces representational errors, not errors of closure. It is preferable not to talk about "anomalies" at all, as it is just an admission of lack of understanding how they arise and their consequences, but see them as inexplicable or unavoidable and use of symbols is that ill-defined that there is no algebra at all -- just some hieroglyphics.

“My point was that a set of 5th normal form relations are not closed under joins.”
But that is precisely the problem: the DBMS cannot guarantee correctness of results, hence the anomalies and failure to fully support view updating and logical independence. That is why a RA revision for 5NF closure is necessary, which in turn requires multi-relation results.
“I understand that McGoveran's interpretations are different than TTM. I don't know why you keep trying to establish the difference. Please get out of your head the notion that I think they're the same. Why can't both exist simultaneously? Let's presume McGoveran's ideas are revolutionarily correct, and solve anomaly problems. That would be great. Can't TTM also be "correct". It's just an algebra (potentially "incomplete"), that just sometimes has these strange anomalies, just like real numbers aren't closed over square roots?”
I get flak when I claim that the plethora of misconceptions in the industry reflects poor understanding of the RDM. My interlocutor insisted that he does understand, yet he accepts "incomplete algebra" when, of course, either there is algebra or anomalies, either guaranteed correctness or lack thereof. DGM's RA is not "just a difference" from TTM grounded in the current RA -- without a RA revision under TTM correctness is not guaranteed, which means that it cannot be a valid version of RDM. Logically, if correctness requires 5NF closure, then 1NF closure under TTM is not just different, but wrong. And if you have an inconsistent algebra, what does 'repair' mean other than replacement with a consistent one?

Codd, while inventing something new and profound, understandably made some ommissions -- had he not become ill he would have potentially fixed them along DMG lines. DMG's ideas are revolutionary only in the sense that they are intended to be consistent with SST/FOPL, which most industry stuff is not, as it disregards the theory. My interlocutor is too much vested in the TTM interpretation and wants badly for it to be correct.
“There are anomalies with exponentiation and real numbers. (-1)^(0.5). So, does that mean you would say that real numbers are incorrect, because they aren't closed under exponentiation? Or that exponentiation isn't a valid operator for real numbers? ... If exponentiation was designed to be an algebra for real numbers, then it is not a proper algebra and there will be anomalies ...”
There is no such thing as "incorrect real numbers", but an algebra for them can be if it is defined such that it produces non-real numbers (i.e., anomalies), because it cannot re-operate on those and produce real numbers -- just like the current RA that produces non-database relations.

McGoveran comments:

An algebra without closure is unreliable, requiring constant re-evaluation of expressions to determine if they are meaningful -- correctness is not mechanical. For such reasons, mathematicians have modified systems to obtain closure. For example, the field of integers was modified to include zero and to be infinite so that arithmetic over the integers was closed. RA is so important that it requires just as much attention and modification regarding exactly how it is closed - exactly what characteristics of relations are to be preserved by relational operators. The RA as defined by Codd (and later by Date, et al) fails to guarantee information preservation, which seems rather obviously necessary for an algebra used to model data and its relationships.

When making a statement about closure of an algebra, it is important to qualify it -- otherwise our sloppy language leads to misunderstandings and misstatements. For example, when we say that arithmetic is closed over the reals, we mean that arithmetic operators (and only arithmetic operators) on real numbers produce real numbers. Saying that the field (e.g., real numbers) is or is not closed over the operation (e.g., square root) is confused and misleading as to what closure is and how it is obtained definitionally in an algebra (e.g., square root is, in fact, closed over the positive real numbers).

Given a definition of database relation and definitions of relational operators for a specific version of RA, ideally all the relational operators would be closed over database relations. Notice there are two types of definitions here for theorists to work with: those of operators and those of operands. Historically, the definition of a relation was the original mathematical one, which has not taken into account semantics. Indeed, an explicit treatment of semantics was not introduced by Codd until 1979 and then only as a more-or-less orthogonal (i.e., independent) addition to the model rather than being integrated into the definitions. An explicit and integrated treatment of semantics is necessary to address the issue of preservation of information by relational operators. It is mere historical accident that any relational operator -- primarily join -- can produce a result that does not preserve information. That such results were called "anomalies" instead of what they are (the failure mentioned above) has been outrageously misleading. Assertions that normalization is anything other than further requirements on the definition of a relation so as to preserve information -- i.e., the semantic consistency of the RA expressions! -- is simply absurd. (The most repugnant absurdity is the assertion that normalization's purpose is to reduce physical, or even logical, redundancy!)

Conclusion

Relations, like numbers, are mathematical abstractions that represent nothing. Database relations, on the other hand, represent entity groups and, thus, are designed and constrained to represent within-group and between-groups relationships, including dependencies. For correctness RA should preserve that information, which requires 5NF closure: operation on database relations that produces database relations. The current RA has only 1NF closure, can produce only single relations, loses information and, thus, cannot guarantee correctness.




No comments:

Post a Comment

View My Stats