Thursday, May 28, 2020

No Such Thing As "Current Relational Data Models"



“... the concept of a state group is indeed a missing modeling concept in relational/current data models...”

Thus in a LinkedIn exchange. I don't know what a "state group" is, but I spent almost six decades debunking the misuses of data model in general and the abuses of the RDM in particular and I smell them from miles away. While the time when lack of foundation knowledge shocked me is long gone, practitioners' total unawareness of and indifference to it, and poor reasoning in a field founded on logic never ceases to amaze me.

What exactly are "relational/current data models"?


--------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

DATA FUNDAMENTALS 

The industry is chockful of misconceptions due to lack of foundation knowledge. Corrections them are dismissed as "theory that is not practical", misinterpreted as "ad-hominem attacks", or ignored altogether, regardless of the amount and quality of reasoning and supporting evidence. Most practitioners -- be it user or vendor personnel -- cannot discern fallacies and do not realize the practical implications thereof and, thus, cannot associate problems with their real causes., hence the industry's "cookbook approach" and succession of fads.

What about you? Are you just a practitioner, or a thinking professional? 

TYFK (Test Your Foundation Knowledge) posts will each present and debunk a pronouncement containing one or more misconceptions. First try to detect them, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs. 

LATEST UPDATES
  • 01/14/20 Updated the LINKS page
  • 01/04/20 Updated the POSTS page with the 2020 posts
  • 12/08/19 Added two educational references on set theory to the LINKS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS) 

USING THIS SITE 
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.
SOCIAL MEDIA
 

I deleted my Facebook account. You can follow me: 
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict. 
--------------------------------------------------------------------------------------------------

What A Data Model Is


A data model is formal, has structure/integrity and manipulation components, and is used to formalize conceptual models of reality understood semantically by users as logical models for database representation that are "understood" algorithmically by the DBMS and implementable physically. To that end a DBMS language must include a data sublanguage that concretizes (expresses the components of) a data model. If it does not, the functionality is relegated to application programs, defeating the concept of database management[1].

By this definition, the only complete data model with a sound theoretical foundation that has been specified formally is the RDM: database relations (structure) constrained semantically by several types of constraints (integrity) and relational algebra (manipulation). All its advantages, foremost among them system-guaranteed and by-design correctness[2], derive from the dual theoretical foundation -- simple set theory (SST) expressible in first order predicate logic (FOPL) -- which materialize if and only if (1) the DBMS has a relational data sublanguage and (2) database design adheres to three core principles required by the theory[3].


A Contradiction


Here's how the author clarified what he meant:
“Well, I do not think the RDM is the only one [formally specified], but it certainly is still the de-facto standard. What I meant are data models that are based on/inspired by the RDM and that largely follow the same semantics. Lots of tools have this kind of data modeling built-in, that is what I meant with 'current' data models.”
All sorts of "data models" other than the RDM have been thrown around in the industry (XML, graph, NoSQL, document), though I dare you to extract their components and theoretical foundation from those who refer to or promote them -- I have tried for decades to no avail, and for good reason[4]. But data models (plural!) based on the RDM, itself a data model??? Of course, a misuse of the term data model, just as I expected[5]: what he really meant is that current logical models are formalized (i.e., databases are designed) using the RDM.

But by his own account:
“...the basic relational model is nicely rooted in set theory. The problem is nobody actually uses that. Take NULL values for example: a major problem in databases, allowing for a lot of denormalization. There are more, like allowing multiple columns with the same name, allowing multiple rows with all of the same values. So, while theoreticaly appealling on paper, there is no practical implementation with the same qualities ... Non-functional system id's that are unrelated to the data represented are in widespread use; a horrible idea, making any data model a parody of anything actually useful.”
Note: I get a whiff of "theory is not practical", but I will ignore that.

NULLs, identically named columns and duplicate rows are SQL features that violate the RDM:
  • A NULL is a mark for the absence of a value ("NULL value" is a contradiction in terms[6]), a violation of Codd's core Information Principle (IP) mandating that all information be represented exclusively and in only one way: as attribute values (drawn from domains) in relations[7].
  • Identically named columns and duplicate rows violate Codd's Guaranteed Logical Access rule mandating that every value is logically accessible by relation name, attribute name and primary key value[7,8].
  • "Non-functional system id's" are not relational keys and, therefore, also violations of the RDM[9].
For these and other reasons SQL tables (not to be confused with R-tables that visualize relations on physical media) are not relations[10,11].
How are "current logical models based on the RDM" if practitioners design non-relational SQL databases???? This is usually the result of poor foundation knowledge and familiarity with the history of the field.

While IBM's original version was inspired by the RDM, for a variety of reasons[12] SQL is not the concretization thereof[13] -- a relational data sublanguage[14]:

  • From the start it had non-relational features and non-database functionality [15], and a plethora more were added over time.
  • The specifications of the ANSI/ISO standard have been shown to be flawed, calling into question whether it has a consistent/valid data model at all[16] (as I showed myself, some of the definitions are circular).
  • Practitioners use commercial dialects, each with its proprietary variations and extensions and, thus, different (with good reasons to assume even less sound) "data models".
  • Name notwisthanding, it is not really structured, not just for queries, and neither a full-fledged, nor a well designed language [17,18,19].

But having been sold as relational, in the absence of foundation knowledge SQL is universally considered the "de-facto standard relational language". In other words, practitioners base their logical models on unspecified and questionable "SQL data models" that violate the RDM which they confuse with the RDM. Most of them are oblivious or indifferent to SQL relational violations, but as this example demonstrates, even the author of the comment who is aware of them fell into the trap. Aside from relational advantages not materializing, SQL deficiencies are imputed to the RDM. In essence, the RDM is being blamed for the costs of the failure to implement it, which is upside down an backwards and impedes implementation.



Yet Another "Data Model"


Instead, the industry practices what I call a "fad-to-fad cookbook approach" promoting various ad-hoc, poorly defined "data models" devoid of a theoretical foundation[4].
“I can point to Alan which is a data model at its core, formally and fully specified within Alan-meta, which is defined within itself. If I am not mistaken we are still able to map that to EBNF notation (it might miss some of the nuances then, though). But maybe you are thinking of specific requirements of that formal specification that I was not taking into account. But to answer your question: yes it is possible to specify the Alan data structure and manipulation very formally. And it has a couple of appealing qualities. Like not allowing null values. Disclaimer: I am very much involved with Alan myself.”
In my five decades of debunking industry misconceptions I have constantly demonstrated that most practitioners -- be their vendor or user personnel -- lack foundation knowledge[20]. The chance of some of these "data models" being the real thing, let alone superior or even equivalent to the RDM is so close to nil that so is the return on the investment of time and effort in investigating them.

We have already seen confusion of logical with data models and of RDM with SQL; NULLs have nothing to do with normalization or denormalization (see next), and tables play no part in the RDM. Then there's the following:

“...it is hard to define mutually exclusive, combined exhaustive attritubes in an RDM. Say: I have a order. When it is placed, it has an order date and a due date. When it is delivered, it has a delivery date and actual delivery address. When it is cancelled, it has a cancellation date and reason. Moving them to separate relations (tables) does not make it clear that the delivered and cancelled states should not occur at the same time. Keeping them in one table, you cannot model this either. Thus you need to use NULL values. And open a can of worms.”
I refer the reader to our proposed relational NULLless solution to missing and inapplicable data (the two are distinct)[21].

My response (slightly rephrased):

“You've got it upside down and backwards. The reason [the theory is not used is because the RDM] is not implemented, [which in turn is] due to ignorance in the industry [vendors, users, experts and trade media] of foundation knowledge and [lack of appreciation thereof]. Everybody is talking about "data models" that are nothing of the kind [let alone relational] because nobody understands what a data model [in general, nor the RDM in particular, are].”
got the following reaction:
“I agree there is a lot of ignorance around. But major rdbms suppliers should be able to know about this, but they still allow for and even promote these practices.”
But the point is why and how should vendors know, let alone bother to implement true RDBMSs if together with users they believe that's what they already have in SQL DBMSs?

So, no, after a glance at the Alan page, I will not make an effort, thanks. You try to figure out (1) if there is a data model in there (2) what are its structure/integrity and manipulation components (3) what is its theoretical foundation and (3) if it is at least beneficially equivalent, if not superior, to the RDM.

Good luck and let me know if and when you do. 

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Pascal, F., What Is a Data Model and What It Is Not

[2] Pascal, F., Logical Validity and Semantic Consistency

[3] Pascal, F., What Is a True Relational System (and What It Is Not)

[4] Pascal, F., Understanding Data Modeling series

[5] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model

[6] Pascal, F., NULL Value is a Contradiction in Terms

[7] Pascal, F., Interpreting Codd: The 12 Rules

[8] Pascal, F., Duplicates: Stating the Same Fact More Than Once Does Not Make it Truer, Only Redundant

[9] Pascal, F., The Key to Relational Keys -- A New Understanding: Primary Keys

[10] Pascal, F., What Relations Really Are and Why They Are Important

[11] Pascal, F., Understanding Relations series

[12] Darwen, H., Why Are There No Relational DBMSs

[13] Pascal, F., SQL Sins

[14] Pascal, F., Data Sublanguage

[15] Pascal, F., DISTINCT and ORDER BY Are Not Relational

[16] Date, C. J. with Darwen, H., A GUIDE TO THE SQL STANDARD, 4th Ed.

[17] Pascal, F., To Really Understand Integrity, Don't Start with SQL

[18] Pascal, F., Precision, Procedurality and SQL

[19] Pascal, F., Language Redundancy and DBMS Performance: A SQL Story

[20] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER

[21] Pascal, F., The Final NULL In the Coffin: A Relational Solution to Missing Data



No comments:

Post a Comment

View My Stats