Sunday, March 1, 2020

Muddling Modeling Part 1: Fundamentals

“Data modelling, star schema, snow flakes, data vault. Implementing virtual data warehouses (many stage to modify relationships). Normalisation (using a lot of surrogate keys) all for the sake of business reporting analytics. Reason a SQL DBMS approach columns rows is mandatory.”

This recent "comment" reminded me of a decades-old article I published in response to a critique by David Hay of the "fact model" then newly proposed by Ron Ross as an "alternative to the data model". In a Letter to the Editor, Hay correctly observed:
“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”
of which the above comment is an exquisite example.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


Some of the names are just re-labelings -- graph (old hierarchic and network) DBMSs come readily to mind[1,2]. Here's another classic example:
“I was amused to read in [Ralph Kimball's] article that my own suppliers and parts database design was "a perfect, beautiful star schema!" When I first learned the term "star schema", my reaction was that a properly designed star schema would be nothing neither more, nor less than a properly designed schema per se (in other words, one that did obey those scientific principles of relational design that do exist). So to see RK say that my schema was in fact a star schema reminded me (I’m afraid) of Peter Chen’s original E/R paper, in which -- among other things -- he reinvented the concept of domains, but called them value sets, and then went on to analyze the relational model in terms of his own ideas and said “Look, domains are just value sets!”
--C. J. Date
Note: Kimball's "star schema" is, of course, not a relational schema, quite "the opposite" (an avoidance of normalization[3]), but we shall ignore that for our purpose here).

Name proliferation is rooted in -- and exploited by the industry's long and profitable tradition of migrations from fad to fad -- poor grasp of data fundamentals. My old article used the Hay-Ross exchange to illustrate the confusion that inhibits understanding of data management even by those who ought to know better. It is instructive with respect to conceptual modeling and database design [4] and even more relevant today as it was originally, confirming my claim of regress and prompting me to bring it up to date.

In this first part we reiterate fundamentals. In the forthcoming second part we will show how poor grasp thereof induce confusion and inhibit understanding. 

Kinds of Models in Data Management

As we have explained so many times, there are four kinds of models in data management. Three correspond to levels of representation:
  • Conceptual models -- each a model of some segment of reality of interest -- are expressed in real world terms (entity groups, entities, properties, relationships) in natural language, and are understood semantically by users;
  • Logical models -- each a formal logical database representation of some specific conceptual model -- are expressed in formal logic and mathematical terms, albeit adapted for and applied to database management (domains, relations, tuples, attributes, constraints) in a formal data sublanguage, and are also "understood" algorithmically by DBMSs;
  • Physical models -- each an implementation in hardware (files, indexes, hashing)and software (DBMS) of some specific logical model[5].

A formal data model is used to formalize conceptual models as logical models for database representation[6,7] (data theory would have been more accurate, but was avoided due to aversion to it in the industry -- a root problem in itself). 

The proliferation of "models" notwithstanding, to date only one data model has been fully formally defined -- the RDM.

(Continued in Part 2)

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Pascal, F.,  Pascal, F., OO/UML and Graph Data Models

[2] Pascal, F., Graph Databases: They Who Forget the Past...

[3] Pascal, F., Data Warehouses and the Logical-Physical Confusion

[4] Pascal, F., Models, Models Everywhere, Nor Any Time to Think

[5] Pascal, F., Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation

[6] Pascal, F., What Is a Data Model, and What It Is Not

[7] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model

No comments:

Post a Comment

View My Stats