Sunday, April 14, 2013

Un-muddling Modeling, Part I: Relations & Relationships

This is a 06/18/17 rewrite of a 04/14/13 to bring it in line with the McGoveran interpretation of Codd's RDM[1].

Here's what's wrong with last week's picture, namely:

"This confusion of entities vs. attributes has been with us a long time ... a paper discussed this dilemma in 1939 [and] proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute ... William Kent's DATA AND REALITY (1978) is devoted to "attributes" and (in my words) he confesses that he can not distinguish between "relationships" and "attributes". Thus, the later might be completely redundant.
I have since come to realize that you must have the relationship first--the notion of an attribute presumes a relationship, so we must define that first. In fact we must also define the exclusivity/multiplicity of that relationship (in both directions for a binary relationship) before we can determine how to put the information into tables. In reality, we are not modeling objects/entities/attributes, [ X ], [ A ], etc. at all in the relational model, we are modeling a bunch of relationships, say [ X | A ], [ X | B ], etc. mashed together into a table [ X | A | B | ...], hence perhaps Codd was correct in calling it a "relation", a bunch of relationships. We would also have to define and enforce referential integrity everywhere an A appeared. All of this is handled explicitly and correctly in ORM--we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object.
Furthermore, the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes." Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not. All of this is handled explicitly and correctly in ORM--we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object ... Looking back into original relational model (by Codd, Date, etc.) is'nt it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together)--should a table (relation) consists only of foreign keys?"

Levels of Representation

As I have argued so many times, it is critical to keep levels of representation distinct in one's mind, to avoid conceptual-logical conflation (CLC) and logical-physical confusion (LPC), the former of which characterizes the above comments (e.g., attributes--logical--represent properties--conceptual). I keep them distinct by using the following terminology[2]:
  • Conceptual modeling: Informal language and real world terms (e.g., groups, properties, objects);
  • Logical database design: Formal language and database terms (e.g., relations, domains/attributes, tuples);
  • Physical implementation: Formal language and physical terms (e.g., storage and access methods);
Note: We prefer 'object' over 'entity' (conceptual) because of the latter's connotation with 'record' (physical) in the pre-relational days (PLC), but we certainly don't mean it in the OO sense.

At the conceptual level confusion between objects and properties arises only under the assumption that whether something is an object or a property is some absolute trait of reality, which it isn't. Conceptual models are based on subjective perceptions of reality--there is no scientific basis on which to prefer one perception over another--and modeling choices are pragmatic, determined by application/user needs. For example, a budget can be modeled as a property of a department object by some and as an object with properties of its own by others. Often both perceptions are useful and can be accommodated in conceptual models[3]

Conceptual Modeling: In the Eyes of the Users

Similarly, as we have explained, some object group properties arise from relationships among individual properties of its members and/or among the members themselves[4]. And a relationship can be modeled as an object, with properties of its own. For example, the relationship between a supplier and a part can be modeled as an object--a relationship object, with quantity as a property. On the other hand, a relationship can also be perceived as a property. For example, the relationships between supplies and suppliers and parts can be modeled with referencing properties. The distinction that the Entity/Relationship Model (E/RM) approach makes is unnecessary and contributes to the confusion[5]. Therefore, Kent should not be interpreted to mean that modeling choices, once made, are confusing, but rather that they are "in the eye of the beholder", so to speak--there is usefulness, not correctness. (Kent's book is a must-read Recommended Book). Otherwise put, make the most useful choice and strive for parsimony, explicitness, well-definedness, and consistency.

It is absolutely true that we must have a complete and well-defined conceptual model before we can represent it as relations (not tables!) That means specification of all groups of objects and properties, some of which arise from relationships. But there is really no "first", conceptualization is sort of "simultaneous"--groups, objects, properties and relationships are defined mutually in terms of each other.

The rest of the comments expose the total lack of understanding of data and relational fundamentals common in the industry.

Modeling groups of objects with properties (not attributes, and objects are entities) is precisely what we do at the conceptual level, which does, indeed, involve relationships: within groups among properties and objects and among groups. The RDM comes into play only at the logical database design level, where we represent the informal groups, properties and objects as formal relations, attributes and tuples and informal business rules as formal constraints that a RDBMS can enforce for database consistency with the rules[6]. If there is a "bunch of relationships" in the real world, what else should relations represent in the database? There is no "mashing", whatever that means, there is formal database representation faithful to the conceptual model, which benefits from the various practical advantages of the RDM, including system-guaranteed logical and semantic correctness[7]

Old Misconceptions

The last comments are utter nonsense.
  • The RDM is nothing but relationships! Domains and attributes are relationships between values. A tuple is a relationship between attribute values. A relation is a relationship among attributes and tuples. A relational database is relationships between relations.
  • Foreign keys are referencing attributes, but a relationship between the referencing and referenced relations is represented by the referential constraint on the two relations, not by the FK--it constrains FK values to match those of the target PK. This is only one type of relationship the RDM represents (see Part II) and Codd never "decreed" anything different[8].
  • What "handled explicitly and correctly in ORM" means escapes me. The ORM is a conceptual modeling methodology that is an expanded version of NIAM and if you want proof, here's the title of the book by its author: CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN (another recommended classic). Being at a different level of representation, it cannot "handle" anything that the RDM does. You can use the ORM--which is superior to the E/RM--to produce conceptual models of reality and the RDM to represent them logically in databases. What you cannot and should not do is conflate, or functionally compare the two[9].
  • That relational comes from "relationships between tables"--the term comes from relations, which are mathematical sets adapted for database management--and that relational databases consist of tables--they consist of relations that can be displayed as R-tables for visualization--are two very old misconceptions, the persistence of which only demonstrates the lack of foundation knowledge and progress in the industry[10].

[Continued in Part II]


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming.

[2] Pascal, F., Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation.

[3] Pascal, F., Formalizing the Informal: Business Modeling for Database Design.

[4] Pascal, F., The Interpretation and Representation of Database Relations.

[5] Pascal, F., Entity/Relatonship Model Not a Data Model.

[6] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[7] Pascal, F., What Is a True Relational System (and What It Is Not).

[8] Pascal, F., Foreign Keys -- Are They Really Necessary?

[9] Pascal, F., E/RM Not a Data Model.


Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment