Thursday, March 12, 2020

Muddling Modeling Part 2: An Example




In an old article I used a Hay-Ross exchange to illustrate how disregard for fundamentals and the associated name proliferation -- which underlies the industry's fad-to-fad tradition -- cause confusion that inhibits understanding of conceptual modeling for database design. A recent LinkedIn exchange -- hardly unique -- showed the article to be as relevant today as it was two decades ago, prompting me to bring it up to date.

In Part 1 we reiterated pertinent fundamentals. Here is the re-written article
-- try to apply the fundamentals from Part 1 before you proceed with our debunking.


--------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.
DATA FUNDAMENTALS
The industry is chockful of misconceptions due to lack of foundation knowledge. Corrections them are dismissed as "theory that is not practical", misinterpreted as "ad-hominem attacks", or ignored altogether, regardless of the amount and quality of reasoning and supporting evidence. Most practitioners -- be it user or vendor personnel -- cannot discern fallacies and do not realize the practical implications thereof and, thus, cannot associate problems with their real causes., hence the industry's "cookbook approach" and succession of fads.
What about you? Are you just a practitioner, or a thinking professional?
TYFK (Test Your Foundation Knowledge) posts will each present and debunk a pronouncement containing one or more misconceptions. First try to detect them, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.
LATEST UPDATES
  • 01/14/20 Updated the LINKS page
  • 01/04/20 Updated the POSTS page with the 2020 posts
  • 12/08/19 Added two educational references on set theory to the LINKS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS)



USING THIS SITE
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA


I deleted my Facebook account. You can follow me:


  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.
--------------------------------------------------------------------------------------------------------

Conceptual Modeling Is Fact Modeling

“Now, in his article "What are Fact Models and Why Do You Need Them?" Ron Ross publishes the "insight" that "the primary audience for the Data Model is the System Designers and the DBAs. As an alternative, he proposes the "Fact Model" that is "part of the Business Model" and is for "Business Analysts and Subject Matter Experts".”
--David Hay
Familiarity with the data model never hurt anybody interacting with databases -- it is actually quite desirable -- but the "primary audience" for it is database designers who use it to formalize conceptual models as logical models for database representation[1]. "System designers" -- DBAs and application developers -- interact with the logical models: the former to implement them physically, the latter to enable application data access and manipulation on behalf of users, so it looks like Ross uses data model to mean logical model.  

Facts are statements about reality that are either true, or false, for example:

Customer identified by CustomerID is named Maria Anders, resides in Berlin, has phone number 030-0074321.

is a fact about (individual properties of) an entity of type customer. 

Entities are distinguishable in the real world (otherwise we would not be able to tell them apart), so another fact is the relationship -- uniqueness -- among entities in the customers group: 

CustomerIDs are unique.

which is a collective property of customers as a group (i.e., a group property)[2].

Conceptual modeling identifies and formulates in natural language facts about (properties of) entities and groups thereof in the real world that are of interest. In other words, conceptual models are fact models[3].

"Business model" is just a more "informal" name for conceptual model. It follows that a fact (aka conceptual aka business) model for "Business Analysts and Subject Matter Experts" is not an alternative to either the data model, or to a logical model produced using it.


Data Modeling Is Database Design

“[According to Ron Ross] data modelers usually try to accomplish two goals at once -- often unknowingly: On the one hand, they attempt to use the data model to explore business requirements with users, while at the same time, to develop system requirements and database designs. He correctly asserts that, to the extent that one does this, it doesn’t work very well.”
--David Hay
If a data model is used to formalize conceptual as logical models, it cannot be used "to explore business requirements with users" (i.e., conceptual modeling)[4]. For example, given the conceptual model in our example, if we use the RDM, facts (collections of defining properties of individual entities) formalize as tuples of relations, and facts (collections of relationships as defining collective properties of a group) formalize as constraints on relations. The R-table
CUSTOMERS
=============================================
CID   NAME              CITY     PHONE#
====-----------------------------------------
 1    Maria  Anders     Berlin   030-0074321
...
=============================================
visualizes a relation representing the customers group, which has CID as primary key (PK), and the uniqueness relationship is represented by a PK constraint (not visible) on the relation.

This formalization using a data model is data modeling, which we know as logical database design[5]. Thus, conceptual (fact) models are models of some specific reality, logical models representing them formally in the database are models of specific data[6] (see conclusion).

We have identified and documented the "dual" practice described by Ross as "using the logical [not data!] model to explore business requirements with users" [i.e., conceptual modeling] as part of conceptual-logical conflation (CLC)[7]. You encounter it when a practitioner, typically, presents a bunch of tables (that may or may not be R-tables), and asks (online rather than his users) if his design (i.e., logical model) is correct[8]. No wonder the practice does not work, further evidence that by data model Ross must mean logical model (why?).
“The problem is that [Ross] then proposes "to stop using the data model for developing business requirements". He then fails to make a very convincing argument for doing so, however. Based on my experience, a preferable strategy would be to stop trying to use the database process to develop system requirements and database designs. In point of fact, his "fact models" that he proposes as an alternative to data models are almost exactly what I produce when I am producing what I call a data model.”
--David Hay
First, we have just explained why neither the data model, nor a logical model -- what Ross means -- cannot be used for "developing business requirements" (i.e., conceptual modeling) -- that's backwards -- but without proper grasp of fundamentals such confusion is common.

Second, what does "stop using the database process to develop database designs" mean? Database design is a database process and thus part of "system requirements".

Third, Hay refers to Ross's fact model (i.e., conceptual model) as data model. Thus, while Ross's use of the term data model makes sense only if it means logical model, Hay's use of it makes sense only if it means conceptual model, when in reality it means neither[9].

Fourth, given a fact (i.e., conceptual) model, "developing a database design" (i.e. formalizing it as a logical model) is precisely what a data model is for![1].

Conclusion


The four types of models are fundamental -- both necessary and sufficient for data management -- and should be kept distinct in one's mind, not confused. We have just illustrated yet again what happens when this is not the case: Ross and Hay use data model to mean logical and conceptual model respectively,  inhibiting understanding of conceptual modeling for database design[1]. In the absence of foundation knowledge proliferation of names (fact, business model) only exacerbates the confusion.

That is why we recommend a three-fold terminology -- conceptual modeling, logical database design (better than data modeling), and physical implementation -- that helps avoid confusion[10], but it does not obviate education on fundamentals. 

Here's what is correct based on that knowledge: Conceptual models are fact models produced by exploring business requirements with users. Database designers usually try to accomplish two goals at once, often unknowingly: at the same time they create a logical model (i.e., design a database) they also attempt to use it to come up with the conceptual model from which it is supposed to represent, which they ought to stop because it cannot work. 



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.



References

[1] Pascal, F., Business Modeling for Database Design: Formalizing the Informal 

[2] Pascal, F., Relationships and the RDM series 

[3] Pascal, F., Fact Modeling 

[4] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models 

[5] Pascal, F., Database Design: What It Is and Isn't

[6] Pascal, F., Conceptual Modeling Is Not Data Modeling

[7] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion

[8] Pascal, F., Databases Representing What? 

[9] Pascal, F., Data Model Neither Business, Nor Logical, Nor Physical Model

[10] Pascal, F., Conceptual Modeling, Logical Design and Physical Implementation




No comments:

Post a Comment

View My Stats