Thursday, November 25, 2021

Nobody Understands the Relational Model: Semantics, Closure and Database Correctness Part 3



(Title inspired by Richard Feynman)

In Parts 1 and 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the relational algebra (RA), as currently defined for 1NF closure, produces update anomalies and, thus, is not a proper algebra. In this third part we will use that information to debunk some leftover misunderstandings in the discussion.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

- 11/19 OBG: The Fate of Fads -- XML DBMS

- 11/11 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

- 11/05 OBG: Database Consistency and Physical Truth

- 10/27 Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1

- 09/19 TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

“Let's presume that there's a functional dependency

{ZIP} -> {CITY,STATE}

So, if there's a relation variable of type

ADDRESS {STREET,CITY,STATE,ZIP}

and there was a [relation] value in that relvar, I claim it is in 1NF and it is not in 5NF because of the functional dependency.

You insist that this value isn't a relation. You also don't like me calling it a table. But C.J. Date calls it a relvar on p.150 of "Database in depth". So it seems to me like you want to invent some new semantics. What would you call such a thing?

I claim it's a relation. I believe C. J. Date would too. It is a set of tuples. It is not normalized ({ZIP} -> {CITY,STATE}FD). So it is not in 5NF, but it meets 1NF.

Prove: 5th normal form isn't required for relations.
Given: There exists ADDRESS as defined above
Given: There exists ZIP as defined above
1. Both ADDRESS and ZIP are relations (as per "a database relation is" at https://www.dbdebunk.com/2017/11/what-relations-really-are-and-why-it-is.html)
2. ADDRESS ⋈ ZIP is a relation (relational closure, ibid)
3. ADDRESS ⋈ ZIP is not in 5NF (functional dependency (ZIP) -> (CITY))
∴ Relations aren't definitionally in 5NF (logical consequence of 2, 3)

Again, I admit that this proof might be incorrect. If so, please point to the line that's wrong. And preferably state why.”
As we already explained, we distinguish between relations -- as in mathematical relational theory in which the RDM is grounded -- and database relations, their RDM adaptation to database management. While the latter are, mathematically, relations, the inverse is not true: the former are not necessarily database relations.

Be that as it may, proving that ADDRESS (I prefer plural ADDRESSES, to focus on a relation being a set representing an entity group) sidesteps what according to my interlocutor himself is the much bigger problem than update anomalies per se -- what causes the anomalies: an "inconsistent algebra". Closure means that the class of the operands and results is the same. 

The current RA is inconsistent because it operates on 5NF relations and produces 1NF relations which, if re-operated upon, can produce anomalies. Those are, thus, not "bugs", but the direct consequence of the inconsistency -- the RA fails to preserve all information about the conceptual model relationships, particularly dependencies. Consequently, the DBMS cannot guarantee correctness when re-operating on those non-5NF results -- it befalls to users/applications, which defeats the purpose of not just the RDM, but of database management per se.

Under the McGoveran (DMG) interpretation of the RDM 

(1) the RA re-defined for 5NF closure; and,

(2) database design adherent to the three principle 

ensures database (5NF) relations, avoiding anomalies and enabling the DBMS to guarantee correctness. This very different from either Codd's original efforts or TTM, whereby users (and application developers) must always be aware of and defend against lurking anomalies, and have no system guarantee of logical independence.

Note: Date and Darwen use relvar for relation variable and relation value for relation. While, of course, that is the case "under the covers", there can be no relvars (in fact, no "variables" of any kind without going well beyond FOPL and inducing many undesirable logical properties), only relations in SST/FOPL, which is why we prefer Codd's "time-varying relations".

“The question you asked was for me to find a major difference between TTM-style and McGoveran-style relational, in your most recent post: https://www.dbdebunk.com/2021/05/tyfk-data-model-logical-model-and-schema.html You also said it was pretty explicit. But I claim there isn't any such explicit differences. You do talk about 3-4 different model representations ("conceptual", "logical", "physical", and "data"), where TTM really only focuses on the difference between "Model vs Implementation".”
There are many differences between what we call the DMG and TTM interpretation of the RA, which are beyond the scope of this discussion (and which will have to wait for the publication of the former). But the former's proper algebra with 5NF closure operating on database relations without anomalies vs. the latter's current RA and 1NF closure certainly qualifies as a critical difference.

Note: The LI discussion was four months ago -- sometime in June-July. The link is to a post I published in May, so it couldn't have been my most recent then -- it looks like a wrong link, which explains why differences are not discernible. A June or July post is more likely and I suspect it's either this one, or probably this one,  the reader is invited to identify differences. For example, DMG has rejected TTM reliance on unique names of relations, domains and attributes beyond their utility as mnemonics for the predicates which uniquely define every relational object type -- relations, constraints, attributes, domains, and databases. By contrast, TTM simply asserts that a different name implies a different type, without providing any effective (computable) procedure for insuring that property at design time, let alone preserving it under relational operators. Two immediate consequences are the possibilities of inconsistent semantics and of irresolvable ambiguities during updates of derived relations (e.g., views).

In fact, some of the differences originate precisely with TTM's focus on "Model [logical] vs Implementation [physical]", eschewing the conceptual. Codd's 1969-70 RDM had little consideration, if any, of the conceptual and his orientation away from predicates/constraints and towards the tabular visual introduced problematics. TTM also eschews the conceptual and thus inherits those. DMG's focus being on the logical deriving, as it does, from the conceptual and on predicates avoids the problematics, a major source of differences.

“Having said this, none of this dbdebunk post of yours talks about set theory or predicates, and that's what you keep insisting I'm missing). And if you want to do that, great. I'm still interested, but if so, "relations can be manipulated mathematically as sets by the relational algebra (RA) with relations as results (relational closure)." (What Relations Really Are and Why It Is Important) is not something that is logical. Such "Pascal-relations" are not closed under joins. I think what you're saying is that the result of a join of two Pascal-relations can very well be a "multi-relation result" (I am still curious about the semantics), but that's not relational closure. Nor is it "with relations as results".”
Points arising:

  • When I refer to fundamentals, I don't mean just SST/FOPL per se, but also conceptual semantics, the three design principles and so on. Other pragmatic [adaptations] follow from the theory. Indeed, hence DMG's database relations and proper RA vs TTM's relations and inconsistent RA.
  • As to multi-relation results, as I illustrated here, they do not require definition: instead of a set of one 1NF relation, there is simply an equivalent set of n 5NF relations, which can be presented by applications to users as a single table, or as multiple tables (it may help recalling that full normalization of a non-database relation produces a set of database relations).


(Continued in Part 4)




No comments:

Post a Comment

View My Stats