Friday, June 12, 2020

Semantics and the Relational Model



“The RDM is semantically weak ... struggles with consistent granularity and has limitations at the property level... it has no concept of data flow ... it is an incomplete theory. Great for its time but needs something better now ... it uses ill defined and linguistically suspect labels ... it has no rules for semantic accuracy ... this just makes the RDM 1% of the truth ... the RDM should have solved this all by now ... but it has clearly not. You fail to see the reality of the failure of RDM in the real world ... this is your choice. I understand why you cling to it ... it is a most excellent theory that I respect greatly ... [but o]pen minds make progress...” 
Thus in a LinkedIn exchange. Criticism of the RDM almost always reflects poor foundation knowledge and lack of familiarity with the history of the field, and as we shall see, this one is not different. It is often triggered by what I call the "fad-to-fad cookbook approach", one of the latest fads being the industry's revelational "discovery" of semantics.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Note: Speaking of which, the LI exchange was in response to an article that stated:
“The "Intention vs Extension" dichotomy is a modeling technique that was developed following the observation of the difficulty of bringing out a "common core" between different ecosystem maps even though it would have been entirely logical to see one emerge!”
"Ecosystems maps" looked like graphic displays of conceptual models to me, so I alerted the article's author to extension and intenSion (not intention) being basic concepts in set theory/RDM with very specific meaning, to which he replied (italics ours):
“I wasn't aware about the use of «intension» and «extension» terms in the context of set theory. So, sorry if you think that I have  used «bad» terms ... but I think that if you read the article you can still grasp «what I mean» with the underlying concepts:
- #INT is related to what a thing «DOES» (It’s the «DO» aspect/factet of things)
- #EXT is related to what a thing «IS» (It’s the «BE» aspect/factet of things).”

Practicing data management unaware of fundamental concepts at the logical level, and assigning them a different meaning at the conceptual level -- common in the industry[1]. EndNote

I referred the RDM critic to several references which call his claims into question and did not get a substantive response, so let's debunk them individually.

“Has no concept of data flow”
The RDM explicitly should not. Relational advantages accrue to data representation, which is orthogonal to data processing (like ETL, etc. which we assume is meant here)[2].
“Struggles with consistent granularity ... it is an incomplete theory”
Not clear what this means.
“Uses ill defined and linguistically suspect labels”
Not clear what this refers to either, but we point out that Nijssen Information Analysis Method (NIAM) -- the predecessor of Object-Role Modeling (ORM) and forefather of fact-based modeling (FBM) -- is rooted in linguistics and the book introducing it was titled CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN[3] -- without any problems.

Note: While Codd did a pretty good job in 1969-70, he didn't always separate conceptual, logical and physical terms -- the levels of representation emerged much later in the mid-80s (when he became ill)[4]. Date and Darwen were more careful, but it is hard and tedious to do this right and some of the terminology is unfortunate and misleading. On the other hand, criticism like this is often horrendously biased by programming terms. End Note

“Has limitations at the property level”
We have written about McGoveran's identification of this issue -- it is fixable and a core objective in his forthcoming book -- but we suspect the critic's perspective is not consistent with ours[5].
“Has no rules for semantic accuracy”
This is simply false -- that is what constraints are for. Most practitioners miss that the RDM is applied theory -- SST/FOPL adapted for database management. One of the adaptations is that database relations are individually and collectively constrained to be consistent with the semantics of the conceptual model represented by the database, hence semantic constraints[6,7].
“Is semantically weak ... the RDM should have solved this all by now ... but it has clearly not”
First, practitioners are unaware that Codd himself has published in 1979 a semantically extended version of the RDM referred to as RM/T[8]. We draw your attention to the following from its conclusion:
“We have attempted to define an extended relational model that captures more of the meaning of the data. Meaningful units of information larger than the individual n-ary relation have been introduced in such a way that apparently competing semantic approaches recorded elsewhere may all be represented therein or translated thereto. The result is a model with a richer variety of objects than the original relational model, additional insert-update-delete rules, and some additional operators that make the algebra more powerful (and unfortunately more complicated). We reiterate that incorporation of larger meaningful units is a never-ending task, and therefore this model is only slightly more semantic than the previous one.”
(italics ours). You are excused if an important aspect escaped you, but if you read Codd's original 1969-70 papers and the RM/T paper you can't help realizing that the single mild sentence we emphasized in the quote is a significant understatement: while the extended version is "only slightly more semantic", it is much more complex. Recall, though, that a core motivation for the RDM was simplicity[9].

But, second, there is a more profound issue to which most practitioners -- lacking a grasp of fundamentals -- are oblivious to. A conceptual model is the semantics (meaning) of the data recorded in the database and the RDM is used to formalize conceptual models of reality as logical models for database representation[10]. The hallmark of a formal language -- which is what logic is -- is independence of semantics[11]:

“...logic -- is an analytical theory of the art of reasoning whose goal is to systematize and codify principles of valid reasoning. It has emerged from a study of the use of language in argument and persuasion and it is based on the identification and examination of those parts of language which are essential for these purposes. It is formal in the sense that it lacks reference to meaning. Thereby, it achieves versatility: it may be used to judge the correctness of a chain of reasoning (in particular, a "mathematical proof") solely on the basis of the form (and not the content) of the sequence of statements, which make up the chain.”
In other words, the richer the semantics, the less general the data model is likely to be. That is why some even advocate semantic reduction in explicit opposition to Codd's extension[12]:
“...systems of operations on data are most effective when they are formalisms, in which semantic considerations are unimportant until the formalism is applied to some specific application. In this way, database processing can join the ranks of successful mathematical abstractions. Differential equations, for instance, can be applied to situations ranging from orbit calculations to the quantum mechanics of the atom. The semantics of each application is unique to that application, but the formalism of differential equations is common. The power of the formalism lies in its abstraction from issues of meaning.
In other words, semantic richness must be balanced against generality and simplicity at the logical level -- the tradeoff must be optimized.
“Is 1% of the truth”
The RDM has provided much, much more in the way of solutions than has been implemented for use -- a huge gap from the first day a so-called "RDBMS" shipped. If use grew from first SQL DBMSs (Ingres, Oracle, SQL/DS) to enormous proportions even with their low relational fidelity[13], imagine what it would have been with true RDBMSs.
“You cannot blame 99% of the population for the limitations of the theory ... even if 1% got it all right ... others would have followed”
But we can blame them for not learning what the RDM is and insisting that is something it isn't; and we certainly blame those who should have known better, but mislead. It is naively optimistic to underestimate the nefarious influence of products and programmers who rule over data management, which is far too powerful for even that 1% to happen in the absence of proper education[14]. We express this opinion with great disappointment, based on too many years of practical experience vs. theorizing. We spent 5+ decades demonstrating poor knowledge and practices in the industry and the regress due to disregard of theory. It's not the RDM that failed, it's the real world's practitioners -- and that's the industry's loss.
“Was great for its time ... but needs something better now ... I understand why you cling to it ... it is a most excellent theory that I respect greatly ...[but o]pen minds make progress...”
Good science is based on constant improvement, and that has been the case with the RDM too (just read the posts and sources on this site). Note very carefully, though, that a new theory must "reproduce all the successes of the currently prevailing theory -- your new idea must succeed in all the places where the prior one succeeds", which, in the database management context, means that it must confer all the advantages of the RDM for its valid range[15]. 

As scientists, we are inclined to improve rather than discard without strong reasons to do so. Data management research and practice has nothing but regressed since the RDM: we see no alternative that would make us even consider replacing it, certainly not with so-called data models that are nothing of the kind[16,17], lack a theoretical foundation (or are falsely claimed to have one)[18]. Lack of foundation knowledge and of familiarity with the history of the field should not be confused for open mindedness.

The RDM is at least a good starting point, far superior in terms of extension opportunities than any "alternative" we've seen (and we have seen and reviewed many). Hence McGoveran's work on refining, re-interpreting, extending and further formalizing the RDM[19].


Here's a conclusion that I lift from my first paper[20]:

Any data management technology claimed to be superior to the RDM
must be based on a data model that: 

  • Has a formal theoretical foundation as sound as SST expressible in FOPL;
  • Has a real world interpretation;
  • Is as complete in terms of:
    - Structure
    - Manipulation
    - Integrity;
  • Is more general, simpler, or both.

Thanks to David McGoveran for reviewing and improving a draft of this post.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored. End Note


References

[1] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models

[2] Pascal, F., Data Sublanguage series

[3] Nijssen, S., CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN

[4] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation

[5] Pascal, F., Properties-object Modeling

[6] Pascal, F., Integrity Constraints

[7] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency

[8] Codd, E. F., Extending the Database Relational Model to Capture More Meaning

[9] Pascal, F., Simplicity: Forgotten, Misunderstood, Underrated Relational Objective

[10] Pascal, F., Understanding Conceptual vs. Data Modeling Parts 1-4

[11] Stoll, R. R., SET THEORY AND LOGIC

[12] Merrett, H. T., Extending the Relational Algebra to Capture Less Meaning

[13] Darwen, H., Why Are There No Relational DBMSs

[14] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER

[15] Pascal, F., Science, "Data Science", and Database Science

[16] Pascal, F., No Such Thing As "Current Relational Data Models"

[17] Pascal, F., What Is a Data Model, and What It Is Not

[18] Pascal, F., "Multi-model DBMSs" Is an Empty Set

[19] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming (draft chapters)

[20] Pascal, F., Business Modeling for Database Design: Formalizing the Informal

 

 

 

 

No comments:

Post a Comment

View My Stats