Sunday, April 14, 2019

Understanding Data Modeling Part 1: Models, Models Everywhere, Nor Any Time to Think




“... I needed to know what the constituent parts of data models really are. Across the board, all platforms, all models etc. Is there anything similar to atoms and the (chemical) bonds that enables the formation of molecules? My concerns were twofold ... I wanted a simple, DIY-style, metadata repository for storing 3-level data models -- what would the meta model of such a thing look like? -- [where] atomicity is of essence ... I took a tour (again) in the Data Modeling zone, trying to deconstruct the absolutely essential metadata, which data modelers cannot do without.”
--Thomas Friesendal, The Atoms and Molecules of Data Models, Dataversity.com

All data models? 3-level data models? Platforms? Hhhmmmm!


------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 the website was been discontinued. The dbdebunk content is not available anywhere else, and if you deem it useful, particularly if you are a regular reader, please ensure its continuation and free availability by supporting it with as much as you can afford via purchases of publications, or donations. Thank you.

3/29/19 Updated the LINKS page:

LATEST PUBLICATIONS (order PAPERS and BOOKS)

USING THIS SITE

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled. As other older posts are rewritten, they will also be relabeled. Use Blogger search for the rest. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:
  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • 3/29/19 REVISED! @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • 3/29/19 REVISED! @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

Data Models in Industry Practice


The subtitle of this first part in a series was the title of an article by Date published at the old DBDebunk. This site has documented for decades, among the many misconceptions about data fundamentals[1], confusion of different kinds of models and levels of representation.

“In the industry a "model" or "data model" (used interchangeably) is a methodology for capturing conceptual and technical knowledge used not only for logical database design, but also for conceptual modeling, physical implementation, programming applications (e.g., processs, user interfaces). Practitioners employ a small group of such "models" -- E/RM, "relational" (by which they mean SQL), ORM, UML, property graph.”

“This entangles the definition of model -- let alone data model -- with (1) notation (i.e., the symbols used to capture knowledge -- e.g., diagrams), (2) methodologies (what you start with, the definitions and relationships between conceptual, logical, and physical, and how you move from one to the other, how you refine, etc), as well as (3) specific applications of these "data models". The available tools implementing one or more of these methodologies are often platform dependent, and limit the targets (DBMS, language, etc.) they can be implemented with.”

“A messy "Tower of Babel" results from these ad-hoc, poorly defined and understood "data models": multiple methodologies for each, tools that are interpretations of those "models", or are based on no "data model", try to integrate features of other tools, attempt to serve all of conceptual, logical, and physical purposes without clear differentiation, and so on. This explains why, instead of soundness and functional completeness, methodologies and tools are chosen for "ease of use", "broad support", and to avoid failure to meet some specific requirements.”

--David McGoveran
Friesendal attempts to catalog the essential components of these industry "data models". The intention is laudable, but even with strong knowledge this is extremely difficult, let alone in the absence of foundation knowledge -- which is why the mess exists in the first place.

In this series we debunk the misconceptions about data and relational fundamentals in the article, and show how foundation knowledge helps make sense of the mess[2,3].


Formal Data Model


In 1969-70 Codd introduced the relational data model (RDM) -- an adaptation and application of simple set theory (SST) expressible in first order predicate logic (FOPL) to database management[4]. A data model in this sense is used to formalize enterprise-specific conceptual models as logical models for database representation[5,6] and, thus, is distinct from the logical models created using it, as well as from conceptual and physical models[7]. In 1981 Codd specified structure, integrity, and manipulation as the essential components of a formal data model[8] -- in effect, outlining a meta-model of data models, all of which have these three components[9] (only outlining, because no meta-language expressing it was specified).

All this notwithstanding, many practitioners refer to "conceptual, logical, and physical data models". That is why we recommend distinct terminology for these three types of modeling that correspond to the three levels of representation recognized in the 80s: conceptual modeling, logical database design, and physical implementation
[10]. 

Clearly, a data model in the Codd sense is distinct from its applications -- the logical models created with it. The same distinction exists at the conceptual level between something equivalent to the data model, and its applications -- the conceptual models created with it (and, of course, at the physical level too). Unfortunately,  there is no such distinction in the industry: as we shall see in Part 2, practitioners  use the term 'entity-relationship model' to refer to both the E/RM modeling approach, as well as to applications thereof -- the conceptual models created with it -- which explains, in part, the poor understanding of Codd's concept of data model, and the modeling mess.

RDM "Atoms" and "Molecules"


Currently the RDM is the only Codd-compliant formal data model with components well defined: relations, several types of constraints, and the relational algebra (RA). (If there are others, what are -- precisely, please! -- their components and formal theoretical foundation?).

In chemical science, atoms (primitive elements) in a particular relationship (chemical bonds), and molecules (derived elements) interact to combine into new molecules. In the RDM, relations are defined in terms of tuples, which are defined in terms of domains and attributes. If the atoms and molecules metaphor is applied to the RDM:

“Domains and attributes are the "atoms" that combine to form tuples and relations ("molecules"). Any RA expression can be understood as a relation (the "result" or derived relation). Constraints are RA expressions and can be understood the same way. Relations are, thus, RA expressions that interact -- combine via relational operations -- to form new expressions/relations.”
--David McGoveran
(Continued in Part 2)



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.
 

[2] Pascal, F., Database Management: No Progress Without Data Fundamentals.

[3] Pascal, F., Industry Practice Is No Substitute for Foundation Knowledge.

[4] Codd, E. F., Relational Model of Data for Large Shared Data Banks.

[5] Pascal, F., What Is a Data Model, and What It Is Not.

[6] Pascal, F., Business Modeling for Database Design: Formalizing the Informal.

[7] Pascal, F., Data Model: Neither Conceptual, Nor Logical, Nor Physical Model.

[8] Codd, E. F., Data Models in Database Management, Workshop on Data Abstraction, Databases and Conceptual Modelling 1980: 112-114.


[9] Pascal, F., Structure, Integrity, Manipulation: How to Compare Data Models.

[10] Pascal, F., Levels of Representation, Conceptual Modeling, Logical Design, and Physical Implementation.




No comments:

Post a Comment