Thursday, March 12, 2020

Muddling Modeling Part 2: An Example

In an old article I used a Hay-Ross exchange to illustrate how disregard for fundamentals and the associated name proliferation -- which underlies the industry's fad-to-fad tradition -- cause confusion that inhibits understanding of conceptual modeling for database design. A recent LinkedIn exchange -- hardly unique -- showed the article to be as relevant today as it was two decades ago, prompting me to bring it up to date.

In Part 1 we reiterated pertinent fundamentals. Here is the re-written article
-- try to apply the fundamentals from Part 1 before you proceed with our debunking.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Conceptual Modeling Is Fact Modeling

“Now, in his article "What are Fact Models and Why Do You Need Them?" Ron Ross publishes the "insight" that "the primary audience for the Data Model is the System Designers and the DBAs. As an alternative, he proposes the "Fact Model" that is "part of the Business Model" and is for "Business Analysts and Subject Matter Experts".”
--David Hay
Familiarity with the data model never hurt anybody interacting with databases -- it is actually quite desirable -- but the "primary audience" for it is database designers who use it to formalize conceptual models as logical models for database representation[1]. "System designers" -- DBAs and application developers -- interact with the logical models: the former to implement them physically, the latter to enable application data access and manipulation on behalf of users, so it looks like Ross uses data model to mean logical model.  

Facts are statements about reality that are either true, or false, for example:

Customer identified by CustomerID is named Maria Anders, resides in Berlin, has phone number 030-0074321.

is a fact about (individual properties of) an entity of type customer. 

Entities are distinguishable in the real world (otherwise we would not be able to tell them apart), so another fact is the relationship -- uniqueness -- among entities in the customers group: 

CustomerIDs are unique.

which is a collective property of customers as a group (i.e., a group property)[2].

Conceptual modeling identifies and formulates in natural language facts about (properties of) entities and groups thereof in the real world that are of interest. In other words, conceptual models are fact models[3].

"Business model" is just a more "informal" name for conceptual model. It follows that a fact (aka conceptual aka business) model for "Business Analysts and Subject Matter Experts" is not an alternative to either the data model, or to a logical model produced using it.

Data Modeling Is Database Design

“[According to Ron Ross] data modelers usually try to accomplish two goals at once -- often unknowingly: On the one hand, they attempt to use the data model to explore business requirements with users, while at the same time, to develop system requirements and database designs. He correctly asserts that, to the extent that one does this, it doesn’t work very well.”
--David Hay
If a data model is used to formalize conceptual as logical models, it cannot be used "to explore business requirements with users" (i.e., conceptual modeling)[4]. For example, given the conceptual model in our example, if we use the RDM, facts (collections of defining properties of individual entities) formalize as tuples of relations, and facts (collections of relationships as defining collective properties of a group) formalize as constraints on relations. The R-table
CID   NAME              CITY     PHONE#
 1    Maria  Anders     Berlin   030-0074321
visualizes a relation representing the customers group, which has CID as primary key (PK), and the uniqueness relationship is represented by a PK constraint (not visible) on the relation.

This formalization using a data model is data modeling, which we know as logical database design[5]. Thus, conceptual (fact) models are models of some specific reality, logical models representing them formally in the database are models of specific data[6] (see conclusion).

We have identified and documented the "dual" practice described by Ross as "using the logical [not data!] model to explore business requirements with users" [i.e., conceptual modeling] as part of conceptual-logical conflation (CLC)[7]. You encounter it when a practitioner, typically, presents a bunch of tables (that may or may not be R-tables), and asks (online rather than his users) if his design (i.e., logical model) is correct[8]. No wonder the practice does not work, further evidence that by data model Ross must mean logical model (why?).

“The problem is that [Ross] then proposes "to stop using the data model for developing business requirements". He then fails to make a very convincing argument for doing so, however. Based on my experience, a preferable strategy would be to stop trying to use the database process to develop system requirements and database designs. In point of fact, his "fact models" that he proposes as an alternative to data models are almost exactly what I produce when I am producing what I call a data model.”
--David Hay
First, we have just explained why neither the data model, nor a logical model -- what Ross means -- cannot be used for "developing business requirements" (i.e., conceptual modeling) -- that's backwards -- but without proper grasp of fundamentals such confusion is common.

Second, what does "stop using the database process to develop database designs" mean? Database design is a database process and thus part of "system requirements".

Third, Hay refers to Ross's fact model (i.e., conceptual model) as data model. Thus, while Ross's use of the term data model makes sense only if it means logical model, Hay's use of it makes sense only if it means conceptual model, when in reality it means neither[9].

Fourth, given a fact (i.e., conceptual) model, "developing a database design" (i.e. formalizing it as a logical model) is precisely what a data model is for![1].


The four types of models are fundamental -- both necessary and sufficient for data management -- and should be kept distinct in one's mind, not confused. We have just illustrated yet again what happens when this is not the case: Ross and Hay use data model to mean logical and conceptual model respectively,  inhibiting understanding of conceptual modeling for database design[1]. In the absence of foundation knowledge proliferation of names (fact, business model) only exacerbates the confusion.

That is why we recommend a three-fold terminology -- conceptual modeling, logical database design (better than data modeling), and physical implementation -- that helps avoid confusion[10], but it does not obviate education on fundamentals. 

Here's what is correct based on that knowledge: Conceptual models are fact models produced by exploring business requirements with users. Database designers usually try to accomplish two goals at once, often unknowingly: at the same time they create a logical model (i.e., design a database) they also attempt to use it to come up with the conceptual model from which it is supposed to represent, which they ought to stop because it cannot work. 

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Pascal, F., Business Modeling for Database Design: Formalizing the Informal 

[2] Pascal, F., Relationships and the RDM series 

[3] Pascal, F., Fact Modeling 

[4] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models 

[5] Pascal, F., Database Design: What It Is and Isn't

[6] Pascal, F., Conceptual Modeling Is Not Data Modeling

[7] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion

[8] Pascal, F., Databases Representing What? 

[9] Pascal, F., Data Model Neither Business, Nor Logical, Nor Physical Model

[10] Pascal, F., Conceptual Modeling, Logical Design and Physical Implementation

No comments:

Post a Comment

View My Stats