Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”
--David McGoveran
Contrary to claims, Date does not exactly say that the E/RM is a data model:
“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]
Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.


I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 


NEW: The Key to Relational Keys: A New Perspective


I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.


  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 

E/R Conceptual Modeling, Relational Data Modeling

The E/RM can be used for conceptual modeling. Here's the gist of it that is common to most of its uses as such in the industry:
“Let e denote an entity which exists in our minds. Entities are classified into different entity sets such as EMPLOYEE, PROJECT, and DEPARTMENT ... A relationship is an association among entities. For instance, "father-son" is a relationship between two "person" entities. A relationship set is a mathematical relation among n entities, each taken from an entity set:
{[e1, e2, ..., en] | e1 ϵ E1, e2 ϵ E2, ..., en ϵ En}
and [each association of] entities, [e1, e2, ..., en] is a relationship. Note that the sets Ei in the above definition may not be distinct. For example, a "marriage" is a relationship between two entities in the entity set PERSON.”
“The information about an entity or a relationship ... is expressed by a set of attribute-value pairs ... An attribute is a function which maps from an entity set or a relationshi set into a value set or a Cartesian product of value sets:
f: Ei or Ri →  Vi or Vi1 x Vi2 x ... x Vin.”
“Note that relationships also have [attributes]. Consider the relationship set PROJECT-WORKER. The set PERCENTAGE-OF-TIME, which is the portion of time a particular employee is committed to a particular project, is an attribute defined on the relationship set PROJECT-WORKER. It is neither an attribute of EMPLOYEE nor an attribute of PROJECT, since its meaning depends on both the employee and project involved.”[3]

e is an entity;
E is an entity set;
V is an attribute (i.e., property);
ϵ symbolizes 'belongs to', or 'is member of'.

To put it less formally, the objects in E/RM are:

  • Entities with properties;
  • Entity sets -- groups of entities of the same type;
  • Relationships -- associations among entities;
  • Sets of relationships.
A relationship can be an association of entities of distinct types (e.g., employees and projects), or of the same type (persons) in different roles (e.g., fathers and sons), and is represented by a combination of the identifiers of the associated entities and properties of the relationship, if any. For example, a relationship between employees and projects is represented by {employee-no, project-no,% of time}.   

The E/RM can be used at the conceptual level to model reality, the RDM at the logical level to model data -- formalize conceptual models produced with the E/RM as "computable" logical models for database representation that are consistent with reality. This is possible because, having the same formal theoretical roots, they make the same ontological commitment. In Part 4 we will outline a new conceptual modeling approach that makes a different commitment, which requires adjustments to the formal theory and, thus, to the RDM. It has advantages over the E/RM and the RDM as currently defined.

We are now in a position to address the issues raised by the original comments and questions, namely:

“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”
“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

  • We have explained why the E/RM is not a data model, and proper experts, including Date and McGoveran, don't say that it is. Nor is it a "mechanism to represent data". It can be used to model and represent reality; the RDM can be used to model data.
  • Many modeling tools are variations on/enha ncements of an automated tool developed by Chen for producing (via the RDM) 3NF logical modelsfrom conceptual models captured in enhanced E/RM notation. They are not necessarily grounded in a correct understanding of the E/RM, and are designed for mapping to SQL (note: not relational[4]!) databases. While the E/RM may suffice for as much as can be captured and enforced by SQL databases and DBMSs, neither Chen's ideas, nor his notation capture all that is desirable[5]. Nissjen's NIAM[6], Halpin's ORM[7], and tools based on them[8] do better, but their added richness is lost on SQL. Indeed, many are used -- especially by programmers -- for their compatibility with OOP.
  • In general, "wrong" conceptual models and database designs are due to combinations of (1) poor knowledge of the reality to be modeled, (2) lack of foundation knowledge[9], and (3) poor tools.

Before we outline the new conceptual modeling approach in Part 4, we will offer in Part 3  a recent example of conceptual-logical conflation (CLC) common in the industry.


[1] Codd, E. F., Data Models in Database Management.

[2] Date, C. J., AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed. (Addison Wesley, 2003).

[3] Chen, P., The Entity-Relationship Model - Toward a Unified View of Data.

Pascal, F., SQL Sins.

[5] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[7] Object Role Modeling.

[8] NORMA - The Software!


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment

View My Stats