Sunday, September 15, 2013

Entity-Relatonship Model: Not a Data Model

This is an improved version of an earlier post.

One of the core advantages of the relational data model is its generality-to-simplicity ratio: with just one structural feature it can represent any reality, relationships included.

In a previous post mentioning this advantage, I made a reference to the "entity-relationship model" (E/RM) and linked to an old paper by Shir Nijssen that argues it is harmful. In 2005 I posted at my old DBDebunk site comments by C. J. Date and myself words to that effect that are worth repeating, revised to reflect some additional wisdom acquired since then.

A database is a formal logical representation of some real world segment of interest, namely an informal business model, consisting of several types of business rules (see paper #1, Business Modeling for Database Design). I refer to mapping the business model to its database representation, the logical model,  as formalizing the informal.

Informal refers to business models' capture of semantics--the meaning as understood by users (properties, entities, attributes, classes). A DBMS cannot understand these concepts, all it can do is manipulate abstract symbols mathematically. That is why formalization is necessary, to "strip the meaning" off: the DBMS can then manipulate symbols mathematically, with results that do not depend on the interpretation (meaning) assigned them by users, only on their form--hence, formal. Think of algebraic manipulation of abstract variables: results do not depend on what the variables stand for. We can manipulate them mathematically, regardless of what they represent in the real world, that is, any particular interpretation.

To map informal business models to formal logical models requires a data model--some formal structure (the equivalent to variables) and manipulation (the equivalent to algebra). The relational data model (RM) offers the R-table and the relational algebra, respectively. Codd exploited the equivalence of set theory and first order predicate logic (FOL) by applying it to database management. The relational algebra ("table arithmetic", if you will) operates on database R-tables to produce R-tables as results. The rows in the results represent true propositions that are logical implications (inferences) of the true propositions represented by the rows recorded in the database. These results are independent of what the R-tables represent in the real world--the meaning of any particular business model.

The term "entity-relationship model" is very common in the industry--where does it belong in this context? Here's C. J. Date:
[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested)...

The "thin layer on top of the relational model" consisted of certain referential actions--cascade delete, etc.--that can certainly be (and usually are) implemented in today's SQL systems, but are not part of the relational model as originally formulated.

The detailed arguments were first written down in a much earlier paper (Entity/Relationship Modeling and the Relational Model, in InfoDB 5, No. 2, Summer 1990, republished in my book RELATIONAL DATABASE WRITINGS 1989-1991)--so they are hardly new, and were almost certainly in print at the time the subject paper was first written.
I do not deem the E/R model a data model. Entity and relationship are real world (business) modeling concepts, not formal logic and mathematical concepts. But even if the E/RM were a data model, it would be inferior to the RM:
  • Less parsimonious and more complex: it requires two core structural concepts to represent reality, while the RM only one, the R-table;
  • Incomplete: most manipulation and integrity features are missing.
  • Lacks a sound theoretical foundation.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment