Sunday, November 25, 2018

Data and Meaning Part 1: The RDM Is Applied Theory




“Fabian - With respect, maybe it's time to' shake the formal foundations' of data management, especially given the rising costs and increasing segregation of silos.”
“John, if I were to say what I really think, I would be accused of insulting, so I won't. You don't need to respect me, but you better respect formal foundations. Since they are what gives SOUNDNESS to data management practice, what you are really saying is that you don't care about soundness -- do you really intend to take this position? I would not be surprised, because the industry has long "shook" the formal foundations and lack of soundness is precisely what characterizes it. But because there is no longer proper education, practitioners are totally unaware of the relationship between formal foundations and soundness, everything is ad-hoc and arbitrary, yet they fail to recognize the consequences.”[1]
--LinkedIn.com
Thus an exchange with John Gorman on LinkedIn, in which he posed several questions (that I answered in the last week's post[2]), the subject being the importance of not confusing levels of representation, and, more specifically, avoiding conceptual-logical conflation (CLC)[3].

Somebody posted a link to my answers on Linkedin and in a comment on it John linked to a Richard Feynman YouTube lecture on "the general differences between the interests and customs of the mathematicians and the physicists". To which I responded that my very point is that, just like physics is not the mathematics used to describe it (a central issue in quantum mechanics), conceptual modeling is not data modeling, the latter is the representation of the former in the database -- they are distinct[4]. This brought to mind some older columns I published on the All Analytics website that no longer exists, so this series is a revision thereof.


------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:
@dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.

@ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Mathematical relations are abstractions (i.e., devoid of any real world meaning), and, thus, can contain arbitrary data, and we can arbitrarily apply any operation of the relational algebra (RA) to them. For example, given the two relations A and B:
  100   26150 ...
 110   38170 ...
 120   37950 ...
 130   33800 ...
 140   35420 ...
 150   30280 ...
 160   27250 ...
 290   15340 ...
 310   15900 ...

 ... 100   06-19-1980 ... 
 ... 110   05-16-1958 ...
 ... 120   12-05-1963 ...
 ... 130   07-28-1971 ...
 ... 140   12-15-1976 ...
 ... 150   02-12-1972 ...
 ... 160   10-11-1977 ...
 ... 290   05-30-1980 ...
 ... 310   09-12-1964 ...
the Cartesian product of A with the projection of B on the second attribute yields a relation, the attributes of which are those of A and the second attribute of B, and the tuples of which are all the possible combinations of each tuple of A with every tuple of the projection of B. In mathematical relation theory the result is meaningless with respect to the real world.

But the RDM is applied relation theory: simple set theory (SST) expressible in first order predicate logic (FOPL) adjusted for applicability to database management. Database relations preserve mathematical properties, but -- distinct from mathematical relations -- are not abstract, but represent in the database sets of facts about perceived real world entities identified during conceptual modeling:

  • Tuples of base relations represent axioms (facts assumed to be true);
  • Tuples of derived relations represent theorems (i.e., logical conclusions inferred from the axioms);
  • A DBMS and database constitute a logical inference (i.e., deduction) engine that derives theorems from axioms.
or, in other words, they have real world interpretations (i.e., carry meaning specified by conceptual models that databases represent)[5,6]. Thus, conceptual and data modeling are distinct, which is why we advocate terminology that prevents confusion[7].

The data must be consistent with the conceptual model of reality as perceived by the modeler, which means that (1) neither the data content of, (2) nor the RA operations applicable to, database relations can be arbitrary -- both are constrained by conceptual modeling. If A and B were database relations representing facts about employee compensations and project assignments:

COMPENSATIONS {EMP#,SALARY,...}
ASSIGNMENTS {...,EMP#,START_DATE,...}
the result of the above Cartesian product (combining each salary with every start date) wouldn't have a "sensible meaning", as a reader put it (i.e., the operation would not correspond to a meaningful query). As another commented, "Most of the real work in any query is planning out what you are asking, how you are asking it, and the meanings assigned."  Which is another way of saying that users must understand the semantics (meaning) of the data specified in the conceptual model by the modeler/database designer!, in order to query the database meaningfully.

While it may be clear in this simple example that the operation makes no sense, this is often not the case in practice, as we shall demonstrate in Part 2.


References

[1] Software Wasteland How the Application-Centric Mindset is Hobbling our Enterprises.

[2] Pascal, F., Conceptual Modeling Is Not Data Modeling.

[3] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[4] Conceptual Modeling Is Not Data Modeling.

[5] Pascal, F., What Relations Really Are and Why They Are Important.


[6] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[7] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.





Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.




No comments:

Post a Comment