Sunday, December 2, 2018

What Is a Data Model, and What It Is Not

“The term data model is used in two distinct but closely related senses. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain, for example the customers, products, and orders found in a manufacturing organization. At other times it refers to a set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses.”
--Data Model, Wikipedia

What a True Data Model Is

Few practitioners realize that Codd invented the Relational Data Model (RDM) as the first exemplar of a data model, a concept that he formalized in 1980 as follows:

Sunday, November 25, 2018

Data and Meaning Part 1: The RDM Is Applied Theory

“Fabian - With respect, maybe it's time to' shake the formal foundations' of data management, especially given the rising costs and increasing segregation of silos.”
“John, if I were to say what I really think, I would be accused of insulting, so I won't. You don't need to respect me, but you better respect formal foundations. Since they are what gives SOUNDNESS to data management practice, what you are really saying is that you don't care about soundness -- do you really intend to take this position? I would not be surprised, because the industry has long "shook" the formal foundations and lack of soundness is precisely what characterizes it. But because there is no longer proper education, practitioners are totally unaware of the relationship between formal foundations and soundness, everything is ad-hoc and arbitrary, yet they fail to recognize the consequences.”[1]
Thus an exchange with John Gorman on LinkedIn, in which he posed several questions (that I answered in the last week's post[2]), the subject being the importance of not confusing levels of representation, and, more specifically, avoiding conceptual-logical conflation (CLC)[3].

Somebody posted a link to my answers on Linkedin and in a comment on it John linked to a Richard Feynman YouTube lecture on "the general differences between the interests and customs of the mathematicians and the physicists". To which I responded that my very point is that, just like physics is not the mathematics used to describe it (a central issue in quantum mechanics), conceptual modeling is not data modeling, the latter is the representation of the former in the database -- they are distinct[4]. This brought to mind some older columns I published on the All Analytics website that no longer exists, so this series is a revision thereof.

Saturday, November 10, 2018

Conceptual Modeling Is Not Data Modeling

“Ok, now that we have those two (Parts 3 and 4 of your series) 'on the table' so to speak, perhaps you would address these questions...
1. Would it be safe to say that facts expressed in a Conceptual model should be verifiable in reality?

2. Are the following facts logically equivalent or are they different:

a) The car with license number 62-JZK-6 has the color aquamarine blue
b) De auto met kenteken 62-JZK-6 heeft de kleur aquamarijnblauw

3. If a previously true fact is found in reality to be verifiably false, would that mean the Conceptual model is wrong or the Logical model, or reality?”

“I'm going to add another:

4. How does RDM handle temporal changes to the 'truth' of statement 2a) when:

a) The owner of the car paints it black.
b) The owner of the license plate legally transfers it to a truck.
c) The owner of the car replaces every single part except the chassis.”

John O'Gorman asked me these questions in a LinkedIn exchange[1] in response to my comments in another exchange on modeling[2], where I alerted to the confusion of levels of representation common in the industry, particularly conceptual-logical conflation(CLC)[3]: calling conceptual modeling data modeling both reflects and induces it.

Online exchanges are not a proper vehicle for learning, particularly foundation knowledge. Which is why I publish free blog posts, and papers and books, to which to refer interested serious data professionals. It just so happened that my just posted four-part series covers the subject at hand[4], so I referred to it, as well as other writings (the answers are already there if one cares to read them). I will not discuss the whole exchanges -- read them and judge for yourself -- but I promised to answer the questions here, where I can do them justice.

John raises primarily conceptual, not data model issues -- the latter are subservient to decisions in the former -- but then asks "how does RDM handle..." From experience, I recognize implicit doubts that the RDM can. As far as we know there is no formal data model[5] that is a superior alternative to the RDM with respect to "handling" conceptual issues (in fact, there is no other formal data model -- i.e., that satisfies Codd's definition -- period).

Since most of the issues involved are covered by McGoveran's work in progress[6] (in which my multi-part series is rooted), to ensure consistency with it I passed the questions by him. As he too pointed out, "Answers that work in all situations require highly complicated discussions and lots of time, and trying to teach someone without proper experience and educational background would be very cumbersome, or an oversimplication via online exchanges." 

Here's what's possible within the constraints of a blog post -- the serious reader is referred to our writings.

Saturday, November 3, 2018

Understanding Conceptual vs. Data Modeling Part 4: Property-Entity Modeling

In Part 1  and Part 2  we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, no clear distinction was made between an informal conceptual level as we now understand it, and a formal logical level. In 1980 Codd gave the first definition of a formal data model, and in the later 80s the conceptual-logical-physical distinction of levels of representation emerged.  If the definition is applied to the E/RM and the RDM, only the latter explicitly satisfies it at the logical level. In Part 3 we presented a typical case of conflation, common in the industry, of the conceptual and logical levels, and confusion of types of model (conceptual, logical, physical, and data).

While the E/RM can be used for conceptual modeling, its weaknesses as such have been thoroughly discussed elsewhere[1], and we will not repeat them here. As promised, we outline a new conceptual modeling approach that makes a different ontological commitment than all modeling to date, and requires RDM extensions for consistency with it.

Sunday, October 28, 2018

Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data

In Part 1 and Part 2  we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:

  • Only the RDM satisfies the definition;
  • The E/RM can be used at the conceptual level to model reality, the latter can be used to model data at the logical level (i.e., formalize conceptual models as logical models for database representation).
Current practitioners, however, continue to confuse levels of representation and confuse/conflate types of model. So much so, that in my presentations I used to draw an imaginary line dividing the room into two sections, and move to the right section to discuss one level/model, and to the left section to discuss another.

Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.

Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”
--David McGoveran
Contrary to claims, Date does not exactly say that the E/RM is a data model:
“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]
Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.