Sunday, April 14, 2013

Un-muddling Modeling, Part I

I've been asked to respond to some comments on an online exchange, Data (and other) models and aesthetics and I can do it more justice here. In what follows I am careful to keep
  • informal conceptual language e.g. property, entity, attribute;
  • formal abstract mathematical language e.g. domain, relation;
  • formal language applied to databases e.g. domain, R-table, row, column;

GE: This confusion of entities vs. attributes has been with us a long time ... a paper published in an Indian journal of Library Science discussed this dilemma [and] proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute." Beautiful, eh? That was in 1939!

AT: [Re:] William Kent and his book DATA AND REALITY (1978). Section 5 (in an edition I have access to) is devoted to "attributes", and (in my words) William confesses that he can not distinguish between "relationships" and "attributes". Thus, the later might be completely redundant.
As I explain in Conceptual Modeling for Database Design, conceptual models are informal--based on subjective perceptions of reality--and there is no theoretical basis on which to prefer one perception over another; rather, modeling choices are pragmatic. For example:
  • A budget can be perceived as an attribute of a department entity by some and as an entity with attributes of its own by others.  Often both perceptions are useful and accommodated in conceptual models.
  • If every employee is assigned to only one department, department number is an employee attribute representing his assignment; but if he is assigned to multiple departments, then the assignment is an associative entity that is a relationship between employee and department entities.
Therefore, Kent should not be interpreted to mean that such choices are not distinguishable once made, but rather that the choices themselves are "in the eye of the beholder", so to speak--there is no correctness, only usefulness. (Kent's book is a highly recommended classic, available via the Home page).

That is why, given linguistic constraints, it is imperative to strive for conceptual parsimony and use of terms that is explicit, well-defined, systematic and consistent.
GE: I have since come to realize that you must have the relationship first--the notion of an attribute presumes a relationship, so we must define that first. In fact we must also define the exclusivity/multiplicity of that relationship (in both directions for a binary relationship) before we can determine how to put the information into tables. In reality, we are not modeling objects/entities/attributes, [ X ], [ A ], etc. at all in the relational model, we are modeling a bunch of relationships, say [ X | A ], [ X | B ], etc. mashed together into a table [ X | A | B | ...], hence perhaps Codd was correct in calling it a "relation", a bunch of relationships.
There is really no "first" here--conceptualization is "simultaneous"--entities, attributes and relationships are defined mutually in terms of each other.

A relation on domains is a mathematical abstraction (not coined by Codd; his relationships notion was different, see below). It is indeed binary--a relationship between two domains. However, Codd adapted it to the practical needs of database management by generalizing to the N-ary relation, in which sense it can be viewed as a relationship between N domains.
GE: Furthermore, the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes." Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not. But then we have to pick the identifier -- try to do that without knowing all the functional "dependencies."
We would also have to define and enforce referential integrity everywhere an A appeared. All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object.

AT: Looking back into original relational model (by Codd, Date, etc.) is not it funny, that the term "relation" is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a "relationship" in our conventional data modelling is represented by a "foreign key" in a table (and combining both points together) - should a table (relation) consists only of "foreign keys"?
This is both inaccurate and poor use of language.
  • We have already seen that R-tables represent relationships between columns. There are other relationships in the relational model (see Part II).
  • I use identifier (attribute) at the conceptual level, key (column) at the logical database level. Keys are another adaptation of the theory for database management (see Part II).
  • An FK is not "exactly like" a column, it is a column, that has a specific relationship with a key column of another R-table. The relationship is represented in the database by a referential constraint. It is true that IT professionals erroneously associate 'relational' with that relationship, but that is not the one Codd meant  (see Part II).
Note: My modeling approach is more in line with the original NIAM approach than with its ORM descendant (see Nijssen's CONCEPTUAL SCHEMA AND RELATIONAL DATABASE DESIGN available via the home page).

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment