Sunday, March 1, 2020

Muddling Modeling Part 1: Fundamentals




“Data modelling, star schema, snow flakes, data vault. Implementing virtual data warehouses (many stage to modify relationships). Normalisation (using a lot of surrogate keys) all for the sake of business reporting analytics. Reason a SQL DBMS approach columns rows is mandatory.”
--LinkedIn

This recent "comment" reminded me of a decades-old article I published in response to a critique by David Hay of the "fact model" then newly proposed by Ron Ross as an "alternative to the data model". In a Letter to the Editor, Hay correctly observed:
“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”
of which the above comment is an exquisite example.


---------------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you. 

DATA FUNDAMENTALS 

The industry is chockful of misconceptions due to lack of foundation knowledge. Corrections them are dismissed as "theory that is not practical", misinterpreted as "ad-hominem attacks", or ignored altogether, regardless of the amount and quality of reasoning and supporting evidence. Most practitioners -- be it user or vendor personnel -- cannot discern fallacies and do not realize the practical implications thereof and, thus, cannot associate problems with their real causes., hence the industry's "cookbook approach" and succession of fads.

What about you? Are you just a practitioner, or a thinking professional? 

TYFK (Test Your Foundation Knowledge) posts will each present and debunk a pronouncement containing one or more misconceptions. First try to detect them, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.
LATEST
  • 01/14/20 Updated the LINKS page
  • 01/04/20 Updated the POSTS page with the 2020 posts
  • 12/08/19 Added two educational references on set theory to the LINKS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS)

USING THIS SITE
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

 ---------------------------------------------------------------------------------------------------------------------------

Some of the names are just re-labelings -- graph (old hierarchic and network) DBMSs come readily to mind[1,2]. Here's another classic example:
“I was amused to read in [Ralph Kimball's] article that my own suppliers and parts database design was "a perfect, beautiful star schema!" When I first learned the term "star schema", my reaction was that a properly designed star schema would be nothing neither more, nor less than a properly designed schema per se (in other words, one that did obey those scientific principles of relational design that do exist). So to see RK say that my schema was in fact a star schema reminded me (I’m afraid) of Peter Chen’s original E/R paper, in which -- among other things -- he reinvented the concept of domains, but called them value sets, and then went on to analyze the relational model in terms of his own ideas and said “Look, domains are just value sets!”
--C. J. Date
Note: Kimball's "star schema" is, of course, not a relational schema, quite "the opposite" (an avoidance of normalization[3]), but we shall ignore that for our purpose here).

Name proliferation is rooted in -- and exploited by the industry's long and profitable tradition of migrations from fad to fad -- poor grasp of data fundamentals. My old article used the Hay-Ross exchange to illustrate the confusion that inhibits understanding of data management even by those who ought to know better. It is instructive with respect to conceptual modeling and database design [4] and even more relevant today as it was originally, confirming my claim of regress and prompting me to bring it up to date.

In this first part we reiterate fundamentals. In the forthcoming second part we will show how poor grasp thereof induce confusion and inhibit understanding. 


Kinds of Models in Data Management


As we have explained so many times, there are four kinds of models in data management. Three correspond to levels of representation:
  • Conceptual models -- each a model of some segment of reality of interest -- are expressed in real world terms (entity groups, entities, properties, relationships) in natural language, and are understood semantically by users;
  • Logical models -- each a formal logical database representation of some specific conceptual model -- are expressed in formal logic and mathematical terms, albeit adapted for and applied to database management (domains, relations, tuples, attributes, constraints) in a formal data sublanguage, and are also "understood" algorithmically by DBMSs;
  • Physical models -- each an implementation in hardware (files, indexes, hashing)and software (DBMS) of some specific logical model[5].

A formal data model is used to formalize conceptual models as logical models for database representation[6,7] (data theory would have been more accurate, but was avoided due to aversion to it in the industry -- a root problem in itself). 


The proliferation of "models" notwithstanding, to date only one data model has been fully formally defined -- the RDM.

(Continued in Part 2)



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Pascal, F.,  Pascal, F., OO/UML and Graph Data Models

[2] Pascal, F., Graph Databases: They Who Forget the Past...

[3] Pascal, F., Data Warehouses and the Logical-Physical Confusion

[4] Pascal, F., Models, Models Everywhere, Nor Any Time to Think

[5] Pascal, F., Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation

[6] Pascal, F., What Is a Data Model, and What It Is Not

[7] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model




No comments:

Post a Comment

View My Stats