Sunday, November 25, 2018

Data and Meaning Part 1: The RDM Is Applied Theory

“Fabian - With respect, maybe it's time to' shake the formal foundations' of data management, especially given the rising costs and increasing segregation of silos.”
“John, if I were to say what I really think, I would be accused of insulting, so I won't. You don't need to respect me, but you better respect formal foundations. Since they are what gives SOUNDNESS to data management practice, what you are really saying is that you don't care about soundness -- do you really intend to take this position? I would not be surprised, because the industry has long "shook" the formal foundations and lack of soundness is precisely what characterizes it. But because there is no longer proper education, practitioners are totally unaware of the relationship between formal foundations and soundness, everything is ad-hoc and arbitrary, yet they fail to recognize the consequences.”[1]
Thus an exchange with John Gorman on LinkedIn, in which he posed several questions (that I answered in the last week's post[2]), the subject being the importance of not confusing levels of representation, and, more specifically, avoiding conceptual-logical conflation (CLC)[3].

Somebody posted a link to my answers on Linkedin and in a comment on it John linked to a Richard Feynman YouTube lecture on "the general differences between the interests and customs of the mathematicians and the physicists". To which I responded that my very point is that, just like physics is not the mathematics used to describe it (a central issue in quantum mechanics), conceptual modeling is not data modeling, the latter is the representation of the former in the database -- they are distinct[2]. This brought to mind some older columns I published on the All Analytics website that no longer exists, so this series is a revision thereof.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Mathematical relations are abstractions (i.e., devoid of any real world meaning), and, thus, can contain arbitrary data, and we can arbitrarily apply any operation of the relational algebra (RA) to them. For example, given the two relations A and B:
  100   26150 ...
 110   38170 ...
 120   37950 ...
 130   33800 ...
 140   35420 ...
 150   30280 ...
 160   27250 ...
 290   15340 ...
 310   15900 ...

 ... 100   06-19-1980 ... 
 ... 110   05-16-1958 ...
 ... 120   12-05-1963 ...
 ... 130   07-28-1971 ...
 ... 140   12-15-1976 ...
 ... 150   02-12-1972 ...
 ... 160   10-11-1977 ...
 ... 290   05-30-1980 ...
 ... 310   09-12-1964 ...

some subset of Cartesian product of A with the projection of B on the second attribute --
all the possible combinations of each tuple of A with every tuple of the projection of B -- yields a relation, the attributes of which are those of A and the second attribute of B and the tuples of which are a subset of the tuples of the cross product. In mathematics the result is meaningless with respect to the real world.

But the RDM is applied relation theory: simple set theory (SST) expressible in first order predicate logic (FOPL) adjusted for applicability to database management. Database relations preserve mathematical properties, but -- distinct from mathematical relations -- are not abstract, but represent in the database sets of facts about real world entities identified during conceptual modeling:

  • Tuples of base relations represent axioms about entities (facts assumed to be true);
  • Tuples of RA derived relations represent theorems (i.e., logical conclusions inferred from the axioms);
  • A DBMS and database constitute a logical inference (i.e., deduction) engine that derives theorems from axioms.
or, in other words, they have real world interpretations (i.e., carry meaning specified by conceptual models that databases represent)[4,5]. Thus, conceptual and data modeling are distinct, which is why we advocate terminology that prevents confusion[6].

The data must be consistent with the conceptual model of reality intended by the modeler, which means that (1) neither the data in (2), nor the RA operations applicable to, database relations can be arbitrary -- both are constrained by conceptual modeling and mathematics of the SST. If A and B were database relations representing facts about employee compensations and project assignments:

 the result of the above Cartesian product (combining each salary with every start date) wouldn't have a "sensible meaning", as a reader put it (i.e., the operation would not correspond to a meaningful query). As another commented, "Most of the real work in any query is planning out what you are asking, how you are asking it, and the meanings assigned."  Which is another way of saying that users must understand the semantics (meaning) of the data specified in the conceptual model by the modeler/database designer!, in order to query the database meaningfully (who must model in accordance to user perceptions of the world).

While it may be clear in this simple example that the operation makes no sense, this is often not the case in practice, as we shall demonstrate in Part 2.


[1] Software Wasteland How the Application-Centric Mindset is Hobbling our Enterprises.

[2] Pascal, F., Conceptual Modeling Is Not Data Modeling.

[3] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[4] Pascal, F., What Relations Really Are and Why They Are Important.

[5] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[6] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment

View My Stats