Wednesday, November 8, 2017

Data Model: The RDM Is, the E/RM Isn't

Minor touches, 11/17/17.

Note: This is a 11/08/17 rewrite of two older posts to bring them in line with the McGoveran's formalization and interpretation [1] of Codd's true RDM.

Here's what is wrong with the picture of two weeks ago, namely:

"ERM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts. Refer to Date 6th edition p347. With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and ERM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an ERM (or extended) can correctly design all the artifacts that are implemented, this means it is modelling [sic] the data. Granted, an ERM does not explicitly model some of the non-structural aspects of the original Codd definition.

Out of interest, is there a common Relational Modelling tool, that is not also an ERM tool and models the full Codd definition? There are also several other methods of modeling data -- ERM is more a mechanism to represent the data. If ERMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting."
Chen's E/RM [2] (1976) preceded Codd's definition of a formal data model [3] (1980) and it would be unfair to hold it to that definition. But even if Chen used "unified view of data" in his paper's title, there is no excuse for "the majority of industry experts" claiming, post-1980, that the E/RM is a data model.

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.

Date does not:
"[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested)." [4,5]

but E/RM is not "the process of deciding the structure of the database". 

E/R Modeling Is Conceptual

Rather, it models (structures) a segment of the physical world as entities and relationships (the relationships are actually among entity types) and, thus, it is a conceptual modeling approach. As such, it is quite clear that it is not a data model (i.e., it is a view of reality, not data) and claiming otherwise  amounts to conceptual-logical conflation (CLC).

Conceptual modeling is, in fact, data model agnostic (i.e., E/R conceptual models can be represented in databases relationally, or using any data model other than the RDM). Problem is, there is no data model other than the RDM that is "a formal system involving structural, integrity, and manipulative aspects", with a theoretical foundation that yields the same advantages.

Note: The only other two candidates that staked data model claims are the hierarchic (HDM) and the network (NDM), but they do not fully qualify due to their incompleteness and physical contamination. Moreover, the simple set theory and first order predicate logic (FOPL) underlying the RDM delivers advantages that the directed graph theory on which HDM and NDM rest cannot:
  • System-guaranteed logical validity and semantic correctness;
  • Declarative, decidable data sub-language;
  • Physical and logical independence;
for which reason even SQL DBMSs, with their poor relational fidelity, proved superior to complex and inflexible HDBMSs and NDBMSs (CODASYL). 

If a conceptual model produced by E/RM "does not explicitly model some of the non-structural aspects", it does not qualify as a data model on incompleteness grounds. But there is more to it than that.

Note: E/RM is used for both the modeling approach and the models of reality it produces (e.g., "the ERMs), which can confuse. Better to use E/R conceptual models for the latter.

A Data Model is Formal

"When we say that something is formal, we mean that it is constructed and defined using a known, explicitly systematic methodology, and preferably that system includes a sub-methodology for improving upon or correcting deficiencies if any are discovered. By contrast, when we say that something is informal, we mean it has not been constructed and defined using any known, explicitly systematic methodology.  Thus, something could be informal either because we do not know any methodology by which it was (or could have been) constructed and defined, or no such methodology exists. For example, when we say a term is informal, we mean that it has only an intuitive definition, one that arises from common usage. Any dictionary definitions are merely examples of synonyms or phrases that can substitute for the term in some context. Such definitions are ambiguous and often even circular." --David McGoveran

E/R conceptual models produced by E/RM are informal and require formalization for computable database representation. A formal data model is used to perform this task (e.g., informal entity types (object groups) and relationships among them are formalized using the RDM as formal relations and constraints). 

Confusing Levels of Representation

A database is not a "target implementation of data" -- an example of the common and entrenched logical-physical confusion (LPC). Rather, it a formal logical data representation of a conceptual model that is implemented physically in the storage supported by a specific DBMS.

A design tool can have:

  • A component that aids with capturing a conceptual model (e.g., entities and relationships) using some notation;
  • A component that attempts to convert the captured information into a logical model (e.g., relations and constraints);
  • A component for creating a corresponding physical model;
but that is not the same as a "relational modeling tool that is also an E/RM tool", which smacks of conceptual-logical conflation (CLC).

My recommendation [6] to use the three-fold terminology

  • Conceptual modeling;
  • Database design;
  • Physical implementation;
is precisely to prevent confusion -- CLC, LPC, or language such as "if ... an ERM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data". 


If the distinctions between levels of representation and between conceptual modeling and database design are properly understood and success is measured by taking soundness and optimality into consideration, the E/RM  can be used for conceptual modeling. It is adequate for capturing the information necessary to design databases in 1NF-3NF, provided
  • Entity types (not entities!) are mapped to relations;
  • M:1 relationships are mapped to referential constraints via foreign keys (FK);
  • M:N relationships are mapped to "associative" relations and FKs;
Chen long ago developed an automated tool for producing 3NF logical models from E/R conceptual models captured in enhanced E/RM notation, but the mapping relies on the RDM. The approach is copied in most modeling tools with minor variations and enhancements, but that does not make E/RM a data model. That is determined by whether it satisfies the definition of a data model as defined in 1980, which it does not. A data model is neither a conceptual, nor a logical model, but a means to formalize the former as the latter for database representation [7]. Here's a metaphor that helps understand the difference between the three: the E/RM is to conceptual models and a data model is to logical models what a programming language is to programs.

E/RM suffices for about as much relational fidelity as SQL DBMSs can support. It has been criticized [8] for its semantic poverty (which limits integrity). Neither Chen's ideas nor his notation let one capture all that is desirable, nor is there a methodology for improvement. NIAM [9], or the ORM extended version thereof [10] are superior, but with SQL DBMSs most of the benefits that their added complexity is intended to provide are lost.

"It a mistake to criticize E/RM based on RDM, or to treat them as competitors; and to treat the process of capturing a conceptual model (i.e., one's understanding of some portion of the physical world) as something formal -- it is inherently a matter of judgment (e.g., regarding what entities, properties, and relationships should or should not be included, or when we are "done") and approximation (e.g., the level of detail in differentiating values of properties), a process of creativity which involves insight and often intuition. We can certainly establish and follow good, sound practices and a methodology that lets us improve on the results over time given some criteria of what we consider useful and correct, but we cannot make it mechanical in the way that a logical model and its uses must be." --David McGoveran
For how to do it properly and take advantage of the full benefits of the true RDM, semantically expanded beyond Codd's preliminary effort [11], you will have to wait for [1].


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Chen, P., The Entity-Relationship Model -- Toward a Unified View of Data

[3] Codd, E. F., Data Models in Database Management, Workshop on Data Abstraction, Databases and Conceptual Modeling, 1980: 112-114.
[4] Date, C. J., AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed. (Pearson, 2003).

[5] Date, C. J., Entity/Relationship Modeling and the Relational Model, InfoDB 5, No. 2, Summer 1990; republished in RELATIONAL DATABASE WRITINGS 1989-1991 (Addison Wesley, 1992).

[6] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

[7] Pascal, F., Data Model: Neither Conceptual, Nor Logical, Nor Physical.

[8] Nijssen, G. M., Duke, D. J., Twine, S. M., The Entity-Relationship Data Model Considered Harmful, Empirical Foundations of Information and Software Science V, pp 109-130.


[10] Terry Halpin's Object Role Modeling.

[11] Codd, E. F., RM/T: Extending the Database Relational Model to Capture More Meaning, ACM Trans. Database Syst., 4(4): 397-434 (1979).

No comments:

Post a Comment