Saturday, May 11, 2019

Understanding Data Modeling Part 5: Conclusions



In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the "data modeling" mess in the industry that Friesendal has tried to catalog, and argued that it can help overcome it. In Part 2 we applied this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM. In Part 3, we applied it to OO/UML and (yet a formally undefined) GDM, and in Part 4 to Fact Modeling (FM).

Here we apply it to Friesendal's conclusions.


------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

NEW 
  • 04/20/19: Added POSTS page with links to all site posts, to be updated monthly.  
  • 4/22/19: Updated the LINKS page.

LATEST PUBLICATIONS (order PAPERS and BOOKS)
USING THIS SITE
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work. 
SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.
 -----------------------------------------------------------------------------------------------------------------

 Core Components of "Data Models"


“My focus is on the core components of data models. And if you scrape off all the DBMS-related technicalities, what remains is (mostly) the conceptual structure and meaning that describes a business context.”

Points arising:
  • The industry's messy concept of "data model" lumps together conceptual modeling, logical database design, physical implementation, and often even application development (see Part 1);
  • "Scraping database-related technicalities" (i.e., the logical and physical) leaves you, obviously, with the conceptual. Concluding from this that "conceptual structure is the common component of data models"  obscures more than enlightens, induces confusion/conflation of levels of representation and types of models[1], and inhibits understanding;
  • Conceptual modeling structures reality, not data[2], into facts of different types expressed in natural language understood semantically by users. A data model in the Codd sense (e.g., the RDM) is used to formalize conceptual to computable logical models for database representation, expressed in a formal data sublanguage understood algorithmically by a DBMS[3,4] and implemented physically. In this view, conceptual structure is not a "common component of data models", but the information (semantics, meaning) that logical models (data) represent formally in the database[5]; a data model is used for formalization;
  • Models at each level have their own essential components, provided by the corresponding conceptual, logical, and physical approach employed to create them (e.g., domains and attributes for logical models created with the RDM). Lumping them all together makes matters less, not more comprehensible.

“In this article you have already seen that both concept maps (cf. the concept map of the property graph concepts above) and fact modeling use metaphors derived from linguistic structures, i.e. things that resemble nouns (concepts, entity types) and some things (relationships) that resemble verbs (linking phrases in concept maps and fact type readings in fact models) ... "Nouns" and "verbs" and "networks" are core constituent parts of data models. As the fact modelers put it, it is all about communication oriented modeling of information.”

Points arising:
  • Nijssen was explicit about the linguistic roots of FM: FM-based conceptual models are expressed in natural language -- nouns and verbs -- understood semantically by humans. DBMSs, however, do not have human, but algorithmic "understanding", for which reason conceptual models are not "computable" (i.e., recordable directly in databases). Hence the need for symbolic formalization as logical models expressed in a formal data sublanguage and, thus, for a data model in the Codd sense that formalizes conceptual models as computable logical models and is, thus, distinct from them[6];
  • E/RM and FM are conceptual modeling approaches. The RDM is a data model advantageous for non-network applications, a GDM -- if and when one is formally specified -- is a data model necessary for network applications. The former was introduced because it is neither sound, nor practical to impose a GDM on non-network applications[7].

“Another important constituent part of data models is keys. One of the common mistakes in modeling is that primary keys (and by definition then also foreign keys) are single-valued. In many cases, though, that is not the case. Concatenated, complex keys are common at the business level.”

Points arising:
  • Keys are a formal, logical feature of the RDM. "Keys at the business level" is CLC[8] -- keys represent at the logical level names of collections of properties that we refer to as entities[9,10]);
  • Since a relation is by definition in 1NF, relational PKs and FKs -- like all attributes -- are defined on simple domains, the values of which are treated as atomic by the relational data sublanguage[11]. It's the "concatenated, complex keys" that are the mistake: they violate 1NF and, thus, the RDM[12].

“Let us try to gather the constituent parts, which we identified as being frequently used across the various Data Modeling paradigms over the last 40+ years[:]
...Now, in conclusion, we see that what really makes data models tick.”


Do we, really? Friesendal identifies the following constituent components:
  • Concepts;
  • Object types;
  • Properties;
  • Functional dependencies;
  • Intra-object relationships;
  • Cardinalities;
  • Associated key (single identity, combined set of uniqueness criteria)
  • type (name)
  • a direction (from/to)
  • Uniqueness criteria (identifying property (key), or a list of concatenated properties);
  • Identity (single identifying property (key), surrogate key);
  • Data types.

We contend this confirms the messy nature of industry's "data modeling": how are these combined in your modeling practice? (exercise for reader: which are conceptual and which logical?)

Compare this with our version of separate, distinct conceptual modeling and logical database design[13,14]: which makes data management sounder and more comprehensible and useful?


Conclusion


In this series we provided some basic foundation knowledge, which, when applied to the messy modeling practice that Friesendal targeted for analysis, identified various misconceptions and confusions. We contend that in the absence of such knowledge, they inhibit understanding and limit the usefulness of such efforts[14]. We believe the outcome of his analysis confirms our position, but the reader should judge for her/himself.

Redoing Friesendal's effort with the benefit of full foundation knowledge is way beyond the scope of this series. It is the purpose of all our writings -- particularly our more in-depth publications -- to provide it (the lack of which is responsible for the modeling mess in the first place[15]), as well as examples of application thereof in practice. Otherwise put, we do not give away fish, we teach how to fish. 


We do, however, emphasize the following distinctions are critical:
  • Data management vs. application development;
  • Network vs. non-network applications;
  • Conceptual vs. logical vs. physical level of representation;
  • Conceptual modeling vs. logical database design vs. physical implementation;
  • Theory-based formal data model vs. applications thereof (i.e., logical models).


References

[1] Pascal, F., Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation.

[2] Pascal, F., Don't Conflate Reality and Data.

[3] Pascal, F., What Is a Data Model, and What It Is Not.

[4] Pascal, F., Data Model Neither Business, Nor Logical, Nor Physical Model.

[5] Pascal, F., Data and Meaning Parts 1-3.

[6] Pascal, F., Natural, Programming, and Data Language.

[7] Pascal, F., Graph Databases: They Who Forget the Past...

[8] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[9] Pascal, F., The Key to Relational Keys: A New Understanding.

[10] Pascal, F., Property-Entity Modeling.

[11] Pascal, F., First Normal Form in Theory and Practice Parts 1-3.

[12] Pascal, F., Outsmarting the DBMS.

[13] Pascal, F., Conceptual Modeling Is Not Data Modeling.

[14] Pascal, F., Understanding Conceptual vs. Data Modeling Parts 1-4.

[15] Pascal, F., Database Management: No Progress Without Data Fundamentals.

No comments:

Post a Comment