Sunday, July 21, 2013

Relational Data Model: It's All Relationships

Note: Minor, but important revision (7/28/13).

In Un-muddling Modeling Parts I and II  I debunk some of the comments in a LinkedIn exchange, but there is another important aspect that merits attention.
AT: ... William Kent and his book "Data and Reality" [1978] ... is devoted to "attributes", and (in my words) William confesses that he can not distinct [sic] between "relationships" and "attributes". Thus, the later might be completely redundant.
GE: This confusion of entities vs. attributes has been with us a long time. Several years [student] ... discovered a paper ... that discussed this dilemma. He proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute." Beautiful, eh? That was in 1939!

Kent's is one of the recommended books (available via my Home page) and one that, in my opinion, no data management professional should be without. Consider, however, the following quote from an equally essential and recommended book (emphasis mine):
"… singular names, property terms and class terms correspond to common sense concepts of individual beings, traits or activities of individual beings and sets of beings who share some common traits or activities … This three-fold distinction is an important part of our Western intellectual heritage and is deeply embedded not only in our thought habits, but also in our language."
Substitute entities for 'singular names' and attributes for 'property terms' and they are very basic concepts--how we inherently view the world; and intuitive--do not have/need explications. Keep in mind that entities, attributes (and classes) are elements of a conceptual model--the informal representation of some aspect of the real world.

Conceptual modeling is a choice of specific entities and attributes, based on subjective perceptions of  reality. They are in the eye of the beholder: one's entity is another's attribute and often there is justification for both representations in the same business model. There are no theoretical grounds on which to prefer one perception over another, although more often than not pragmatic considerations induce certain preferences and consensus can be achieved. Inherently subjective and ad-hoc choices cannot be avoided or worked around at the conceptual level. All that can be required is that a specific conceptualization is internally consistent and, once adopted, is used consistently. This is precisely why (1) conceptual modeling requires thorough familiarity with the reality being represented and (2)  formalization to a logical model is indispensable for database representation.
AT: Looking back into original relational model (by Codd, Date, etc.) is not it funny, that the term "relation" is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a "relationship" in our conventional data modelling is represented by a "foreign key" in a table (and combining both points together) - should a table (relation) consists only of "foreign keys"?
A relation is defined on domains, so it is a relationship--Cartesian product--of those domains, but it is a mathematical abstraction. An 5NF R-table is a relation applied to database management and, as such, it represents a real world relationship between the attributes of one class of entities. It is a very specific kind of within-class relationship--a functional dependency (FD) on the identifier of the other attributes. The FD is represented in the database by a PK constraint.

Another type of real world relationship is cross-class and exists between an attribute of one class referencing an identifier of another class, to which we refer as referential and which is represented by an FK constraint.

We say that the "PK and FK's represent these relationships", but that is an informal shorthand. Strictly speaking, PK's and FK's are columns that participate in the relationships. In other words, the attributes in the real world participate in relationships, they are not themselves relationships.
GE: ...In reality, we are not modeling objects/entities/attributes ... at all in the relational model, we are modeling a bunch of relationships, say [ X | A ], [ X | B ], etc. mashed together into a table [ X | A | B | ...], hence perhaps Codd was correct in calling it a "relation", a bunch of relationships. Furthermore, the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes." Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not. But then we have to pick the identifier--try to do that without knowing all the functional "dependencies."
The first part of the first statement and the last statement are correct, as I just explained: entities are "a bunch of mashed FD's" represented by R-table rows and a PK constraint and cross-class referential relationships are represented by PK constraints.

Note: Objects are, at best, an application, not database concept--there is no "object data model". If there is, what are--precisely, please!--its logical structure, manipulation and integrity components and on what theoretical foundation it rests?

It follows that:
  • "The relational model has no relationships" is, obviously, bunk; 
  • Attributes are not relationships, they participate in relationships;
  • Codd did not "decree that all relationships must be represented by foreign keys": relational theory requires all information--including relationships--to be represented by values in R-tables, which is what FK's achieve; FD's are not represented by FK's;
  • Codd did not coin 'relation', it is a formal mathematical term which, I suspect, is rooted in the fact that a relation is a relationship of domains; when "people think of relationships as the distinguishing characteristic" of the RM, they are mistaking one relationship--the mathematical Cartesian product of domains--for another--the database referential between R-tables.
The most important point to be taken from this is that the RM does not require anything other than R-tables (and domains) to represent reality: relationships are implicitly included (incidentally, that is one of the reasons why the E/R Model [Is] Considered Harmful).

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment