Sunday, June 25, 2017

Relations & Relationships Part I

Note: This is a 10/17 rewrite of a 04/13 post to bring it in line with the McGoveran formalization and interpretation [1] of Codd's true RDM. Some of the references have also been rewritten for the same reason and it is recommended that you re-read them.

There is much confusion, misuse and abuse of terminology and outright nonsense in the wrong picture of two weeks ago, all rooted in poor grasp of data and relational fundamentals. I debunk as much of each of the three paragraphs as is amenable to reason.

Levels of Representation

It is critical to keep levels of representation distinct in one's mind and avoid conceptual-logical conflation (CLC) and logical-physical confusion (LPC) [2], both of which are reflected in the comments. The following terminology [3] helps:
  • Conceptual modeling: Informal language and real world terms (object groups, properties, objects business rules);
  • Logical database design: Formal language and database terms (relations, domains/attributes, tuples, constraints);
  • Physical implementation: Formal language and physical storage and access terms (e.g., files, indexes);

Conceptual Modeling: In the Eyes of the Users

"This confusion of entities vs. attributes has been with us a long time ... a paper discussed this dilemma in 1939 [and] proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute ... William Kent's DATA AND REALITY (1978) is devoted to "attributes" and (in my words) he confesses that he cannot distinguish between "relationships" and "attributes". Thus, the later might be completely redundant."
First, entities are conceptual, attributes are logical (CLC). So the referenced confusion is actually entities vs. properties, the conceptual element to which logical attributes correspond (Historically, entity had a connotation with physical record, for which reason we prefer object -- though certainly not in the OO sense -- to avoid conceptual-physical confusion.

Second, difficulty to distinguish arises only when seeking absolute entities, relationships and attributes inherent in the real world. Modeling choices are based on subjective user perceptions of reality and pragmatic (‪i.e., driven by how well they satisfy specific application/user needs). There are no scientific grounds for preferring one perception/choice over another. For example,
  • A budget can be modeled either as a property of a department object, or as an type of object with properties of its own by financial management. Often both perceptions are useful (e.g., the former for the department staff, the latter for finanacial management) and can be accommodated within a model [4];
  • A many-to-many (M:N) relationship between two object groups can be modeled as an associative object group;
  • Some object group properties arise from relationships (a) between individual properties shared by, and (b) among all group members [5].
The difficulty is eased once it is realized that relationships and properties are "in the eye of the users": the modeler chooses based on their perceptions and needs discovered via requirements analysis. What is required is that choices are well defined, explicit and, once made, used consistently (Kent's is a classic, must read Recommended Book).

A well-defined and complete conceptual model is essential for a database representation -- business rules that specify the required (1) First order individual properties shared by members (2) Second order properties (2OPs) arising from relationships among those properties (3) Third order collective properties (3OPs) arising from relationships among all the group members  and (4) Collective properties of the set of groups as a whole [5].
"I have since come to realize that you must have the relationship first -- the notion of an attribute presumes a relationship, so we must define that first. In fact we must also define the exclusivity/multiplicity of that relationship (in both directions for a binary relationship) before we can determine how to put the information into tables. In reality, we are not modeling objects/entities/attributes, [ X ], [ A ], etc. at all in the relational model, we are modeling a bunch of relationships, say [ X | A ], [ X | B ], etc. mashed together into a table [ X | A | B | ...], hence perhaps Codd was correct in calling it a "relation", a bunch of relationships. We would also have to define and enforce referential integrity everywhere an A appeared. All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object."
Much of this paragraph is gibberish that I cannot hope to make sense of.
  • We don't put information in tables, we put data in relations; we visualize relations as R-tables on physical media (e.g., paper, screen);
  • A relation is and has always been "a bunch of relationships" [6] -- formal logical representation of real world relationships among user-defined
(a) object properties
(b) objects and
(c) object groups
Codd did not name it, it's been a special kind of set in set theory that way preceded Codd -- he only borrowed, adapted it and applied it to database management. It is testament to the poor education and foundation knowledge in the industry that practitioners either still don't know it, or only realize it now, five decades after the RDM was introduced [7].
  • Object Role Modeling (ORM) is a conceptual modeling methodology expanded from NIAM that is superior to E/RM (though not faultless [8]). You can use it to produce conceptual models of reality and the RDM to represent them formally in logical databases -- conflating the two is CLC. If you want proof, consider the title of NIAM author's book: CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN (another Recommended classic).
  • A referential constraint is the formal database representation of a real world relationship between two object groups expressed by a business rule. It is -- and should be -- as explicit at the logical level as the business rule specifying the relationship is at the conceptual level.
"Furthermore, the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes." Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not.  As I said before, an attribute is an object playing a role in a relationship with another object ... Looking back into original relational model (by Codd, Date, etc.) isn't it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together)--should a table (relation) consists only of foreign keys?" --Gordon Everest,
Well, which is it -- a bunch of relationships, or no relationships?

Modeling reality as groups of objects with properties is what we do at the conceptual level and business rules specify several types of relationship -- within groups, among properties and objects; and among groups -- a
ll supported by the RDM at the logical level.

FKs are indeed attributes involved in the representation of inter-group relationships [9] -- one of the adaptations of abstract set theory for database management, such that relations could be assigned real world meaning as representing object groups [10]. But it is not the FK that represents the relationship -- the referential constraint on both the referenced and referencing relations that enforces FK-PK values matching does. 

The old misconception that relational comes from "relationships between tables" -- it actually comes from relations -- gave rise to another that referential rules are the only kind of relationship that the RDM represents. Codd has "decreed" nothing of the sort -- stay tuned for Part 2.

The paragraph ends with more gibberish that does not merit attention.


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F.,
The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[3] Pascal, F., Levels of Representation Conceptual Modeling, Logical Design and Physical Implementation.

[4] Pascal, F., Formalizing the Informal: Business Modeling for Database Design.

[5] Pascal, F., The Interpretation and Representation of Database Relations.

[6] Pascal, F., Relational Data Model: It's All Relationships.


[8]  Nijssen, G. M., Duke, D. J., Twine, S. M., The Entity-Relationship Data Model Considered Harmful, Empirical Foundations of Information and Software Science V 1990, pp 109-130, 1988. 

[9] Pascal, F., Foreign Keys -- Are They Really Necessary?

[10] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment