Chris Date once published an article at the old DBDebunk titled “Models, Models, Everywhere, Nor Any Time to Think”. If you want to get a hold of what he meant then, you oughta do a search on the title now and see what you get.
The continuous proliferation of models is an indication and measure of the disregard, if not outright hostility of the industry to sound theoretical foundations. It keeps reminding me of a decades-old piece I posted in response to David Hay's critique of Ron Ross's then proposal of a “fact model” (yet another one) as an alternative to data model. It is more relevant than ever, which is why I decided to bring it up to date. The problem is so entrenched and widespread, that even those who try to address it fail to realize that they are victims of it too.
Hay correctly observed:
“In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved.”
In this industry, many of the names are just re-labeling, whether it fits or not. Here are a couple of exquisite examples of both cases:
“I was amused to read in [Ralph Kimball's] article that my own suppliers and parts database design was "a perfect, beautiful star schema!" When I first learned the term "star schema", my reaction was that a properly designed star schema would be nothing neither more, nor less than a properly designed schema per se (in other words, one that did obey those scientific principles of relational design that do exist). So to see RK say that my schema was in fact a star schema reminded me (I’m afraid) of Peter Chen’s original E/R paper, in which—among other things—he reinvented the concept of domains, but called them value sets, and then went on to analyze the relational model in terms of his own ideas and said “Look, domains are just value sets!” --C. J. Date
Note: Kimball's "star schema" is, of course, not a relational schema, but quite an attempt to avoid it, due to failure to distinguish application views of the database from the database schema.
------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
This content here is not available anywhere else, except in regurgitations and hallucinations of LLMs, potentially mixed with other
garbage. If you deem it useful,
particularly if you are a regular reader, please help upkeep it by purchasing papers, donating, or contact me for online
seminars/consulting.
USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or
acronyms of the terms listed on the SEARCH page. For detailed
instructions on how to understand and use the labels in conjunction with that
page, see the ABOUT page.
The 2017 and 2016 posts, incl uding earlier posts rewritten in 2017 were
relabeled accordingly. As other older posts are rewritten, they will also be
relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the
2017 columns @dbdebunk, and within them links to sources external to
AllAnalytics may or may not work.
SOCIAL MEDIA
You can follow me @LinkedIn, and ThePostWest on X.
------------------------------------------------------------------------------------------------------------------
Re-labeling things to create impression of innovation is in vendors’ long and profitable tradition of migrations from fad to fad (the industry operates like the fashion industry), which exploits the disregard for, hostility to, and poor understanding and appreciation of sound theoretical foundations.
Kinds of Models in Data Management
There are four kinds of models in database management, three of which correspond to levels of representation.
· A conceptual model (CM) is a model of information about some segment of reality of interest. It consists of business rules (BRs)—statements in "structured" natural language that define object (entity, group, multigroup) types by specifying the defining and required properties of objects of those types. For example, the BR:
Customer identified by CustomerID (CID) has first name (FNAME), has last name (LNAME), resides in city (CITY), has phone# (PHONE#)
defines a customer type of entities that form a group, where every entity is defined by a fact, a statement that specifies the entity's property values, and is asserted true:
Customer identified by CustomerID 1008 has first name Maria has last name Anders, resides in Berlin, has phone# 030-0074321.
· A logical model (LM) is created by assigning the meaning of terms in a CM (properties, entities, groups, multigroup) to symbols of a formal logical theory, by which the theory acquires an interpretation—a process known as logical database design (LDD). If/when the theory is RDM, the symbols of the theory symbolize sets (domains/attributes, tuples, relations, database). In a LM BRs are—represented by predicates—FOPL versions of BRs, and facts by propositions asserted true (instantiations of predicates).
Each CM is a source of meaning as understood semantically by users of the corresponding LM, and each LM is a formal symbolic representation of the CM as "understood" algorithmically by a DBMS. Note carefully that a LM is a model of the theory—an application of the theory to the subject matter modeled by the CM.
· A physical model (PM) is an implementation of a LM in hardware (records, files, indexes, hashes, and so on).
· A data model is neither of the three above, but the formal logical theory to which the meaning of CMs is assigned produce LMs for database representation and manipulation (data theory would have been more accurate and less confusing, but was avoided due to aversion to it in the industry).
Producing a LM is what is known as logical database design (LDD). Nevertheless, very few (if any) know or realize what LDD really means: the assignment of meaning of terms of CMs to symbols of a data model/theory (WHAT MEANING MEANS: BUSINESS RULES, PREDICATES, CONSTRAINTS, AND SEMANTIC CONSISTENCY).
In our example, the meaning of the term of the CM ‘customers group’ is assigned to the relation symbol of the theory (RDM)—and similarly for all other groups—to produce the LM in which the groups are represented formally by relation symbols. For customers:
CUSTOMERS
==============================================
CID FNAME LNAME
CITY PHONE#
=====-----------------------------------------
1008 Maria Anders Berlin
030-0074321
...
==============================================
Conceptual Model and “Fact Model”
“Now, in his article "What are Fact Models and Why Do You Need Them?" Ron Ross publishes the "insight" that "the primary audience for the Data Model is the System Designers and the DBAs. As an alternative, he proposes the "Fact Model" that is "part of the Business Model" and is for "Business Analysts and Subject Matter Experts".” --David Hay
So if by data model Ross meant data theory, the claim that the primary audience are database designers would be trivial, and no insight. (But did he? Read on).
As to the proposed “alternative”, on the one hand we have seen that a CM consists of BRs that define fact types, so it is, in this sense, a fact model; on the other hand, a CM does not include the facts themselves (the propositions, the data), so in this sense it is not. Whether it is, or is not a fact model, a CM is not “part of a CM”!
No Logical Database Design without Data Model
“[According to Ron Ross] data modelers usually try to accomplish two goals at once—often unknowingly: On the one hand, they attempt to use the data model to explore business requirements with users, while at the same time, to develop system requirements and database designs. He correctly asserts that, to the extent that one does this, it doesn’t work very well.” --David Hay
The term 'data model' has been so utterly corrupted that it has been emptied of meaning. It is used to mean conceptual, logical, or physical model, often simultaneously, in an almost universal effort to avoid theory in general, and RDM in particular.
Hay is referring here to what I call conceptual-physical conflation (CLC), but it is not due to “using the data model for both” in the correct sense of the term. Rather, it is due to the absence from the LDD process of the data model/theory altogether, leading to conflation.
Note: One example of CLC is a practitioner presenting one or more tables and asking if “the design is correct” or if they are normalized (why?).
“The problem is that [Ross] then proposes "to stop using the data model for developing business requirements". He then fails to make a very convincing argument for doing so, however. Based on my experience, a preferable strategy would be to stop trying to use the database process to develop system requirements and database designs. In point of fact, his "fact models" that he proposes as an alternative to data models are almost exactly what I produce when I am producing what I call a data model.” --David Hay
Because the data model/theory is used to produce LMs from CMs, we have already determined that “using the data model to explore business requirements with users” won’t work, which is why I suspect that when Ross writes about such uses of the data model, he does not mean data theory. And whatever a “database process” is, I doubt it should be used for LDD either.
Of course, Hay produces what he calls “data models”, but certainly not a data model—that was produced by Codd, and as far as I know only McGoveran works on revising/extending it.
From their other writings, I would guess that while both use the term data model, Hay probably means CM and Ross LM. It means neither.
Conclusion
Continuous proliferation of “new models” indicates no reliance on formal fundamentals and, thus, lack of appreciation of the soundness that they confer on database management.
The four kinds of model are fundamental—both necessary and sufficient for database management—and should be kept distinct, not confused, in one's mind. I have illustrated what happens when this is not the case. To help prevent it I advocate the three-fold terminology:
· conceptual modeling;
· logical database design (reserving ‘modeling’ for conceptual);
· physical implementation.
· CMs is the source of meaning of LMs, LMs are formal logical representations of CMs.
· When the meaning of CMs is assigned to a data model/theory, it produces LMs-- this is LDD.
Practitioners disregard, and have poor understanding of data fundamentals. They:
· Use the models interchangeably;
· Confuse levels of representation and the four kinds of model;
· By avoiding the theory, they lack understanding of the LDD process, which leads to conflation and/or arbitrary LMs.
No comments:
Post a Comment