Saturday, June 2, 2018

Relationships and the RDM Part 3: Beware the Misconceptions



Note: This is the last part of a three-part re-write of two earlier posts, to bring them into line with McGoveran's formalization and re-interpretation[1] of Codd's true RDM, including his own corrections, refinements and extensions. For an in-depth treatment see [2,3].

(Continued from Part 2)

I started Part 1 with a bunch of comments revealing misconceptions (among the many rampant in the industry[4]) about RDM poor support of relationships, if any. Debunking them requires foundation knowledge that is lacking in the industry, so in Part 1 I documented the types of relationships that need to be supported by the RDM, and in Part 2 I showed that they are supported via relational integrity constraints by the RDM (though not necessarily by SQL DBMSs that are confused with RDBMSs), and pinpointed factors that, in the absence of foundation knowledge, contribute to the misconceptions.

Armed with this information I will now debunk the comments with which I started. They all exhibit confusion of levels of representation -- conceptual-logical conflation (CLC) and/or logical-physical confusion (LPC)[5] -- a common source of problems in database practice.



------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS


NEW: SOCIAL MEDIA

The social media pages are now as follows. Please make a note of it and disseminate.
Twitter:


Facebook:


The DBDebunk pages will contain links to updates to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.

The PostWest pages will contain links to evidence for, and my take on Evidence on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation".

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------
“William Kent confesses (in my words) that he can not distinguish between "relationships" and "attributes" ... the later might be completely redundant ... the notion of an attribute presumes a relationship, so we must define that first ... All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes ... an attribute is an object playing a role in a relationship with another object.”
Properties and relationships are informal elements at the conceptual level, which are represented in the RDM by formal domains/attributes and integrity constraints on relations, respectively, at the logical database level. Kent was writing in the early days of the RDM, when the only understanding was the conventional one, and the E/RM[6]was the dominant conceptual modeling approach. The E/RM recognizes only relationships among entities of different types, and properties and relationships are distinct concepts. Our modeling approach unifies them[1,2]:
  • Relationships among entity properties are individual properties;
  • Relationships among entity members of a group are collective group properties; and,
  • Relationships among members of different groups are collective multigroup properties.
Even though relationships are properties, the two are not indistinguishable.

Note: At the time of Kent's writing the RDM had only domains -- logical attributes were added later, whch explains why he used them at the conceptual level instead of properties. Object-Role Modeling (ORM) is a conceptual modeling approach superior to E/RM
[7] that, although not ideal, can better accommodate our approach.
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ...”
A good example of why I frown on "data modeling", and recommend reserving modeling for the conceptual level, and use database design at the logical level[8]. In that sense we are, indeed, not "modeling at all in the relational model", not even relationships -- we are formalizing conceptual models as logical models for database representation (i.e., doing logical database design). The "bunch of relationships" among properties, group members, and groups is modeled at the conceptual level, and formalized as integrity constraints on and among relations, which represent entity groups in the database at the logical level. As already explained, the RDM supports all of the constraints[3].
“... isn't it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together) -- should a table (relation) consists only of foreign keys? ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys).”
  • There is nothing funny about any misconception, and confusing relations with table is a most common and entrenched one[9] -- R-tables are just visualizations of relations on some physical medium.
  • A relationship between entity groups is not represented by a foreign key (FK), but by a referential constraint on the relations representing the groups -- and it is but one kind of relationships supported by the RDM;
  • For the relationship among "properties and entity identifiers" and the corresponding general dependency constraint, see below and [1,2,3];
  • Codd's Information Principle states: "All information in a relational database is represented explicitly and in exactly one way: by values in relations". That includes information about relationships among entity members of different groups, represented by matching FK-PK values. Codd never said these are the only relationships to be supported by the RDM (McGoveran points out that PK and FK must respect consistency, but the relationship itself is not given by them).
“In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements -- I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set, or an element of a set? (Or may be a set of sets?).”
To reiterate, relationships are conceptual, sets, tuples, and relations are logical.

A mathematical relation -- an abstraction devoid of meaning -- is a subset of a Cartesian product of domains -- a set of unordered tuples, which are sets of valued drawn from the ordered domains. A database relation (logical), which carries meaning -- it represents an entity group (conceptual) -- is constrained to be consistent with the group (i.e., represent it accurately), including the relationships among (1) individual properties shared by the members and (2) members, the latter being collective group properties (conceptual)[2,10];

A tuple (logical) is a set that represents (a fact about) an entity (conceptual); a relation which is a set of tuples (and, thus, a set of sets) (logical), represents (facts about) members of an entity group.

“I argue that there is essentially no difference between relationships between entity (type tables) and between an entity and its attributes. They both represent relationships between two populations of things. Something is an attribute by virtue of there being a relationship. If relationships are represented by foreign keys and the entity tables must be in 1NF, as in the relational model, then all relationships must be at most Many-to-One (a very unnecessary limitation when modeling some user domain).”
In Part 1 I discussed the different kinds of relationships, enforced by distinct  formal constraints[3].

The relationships (1) among entity groups and (2) between properties (not attributes) and each of the entities are quite distinct -- the former is represented by a general dependency (GD) constraint, the latter by a referential constraint[2,3]-- and saying "they are both relationships obscures", rather than enlightens.

A relationship among entities of the same type that are members of a group is a collective group property we referred to as a third order property (3OP), represented in the database by a constraint. For example, a PK constraint enforces a uniqueness relationship.

A relationship among entities of different types that are members of different groups is a multigroup property we referred to as a fourth order property (4OP), represented in the database by a referential (FK) constraint;

Relations -- not "entity tables" -- are by definition in 5NF, not in 1NF[10];

There are three types of 4OPs due to multi-group relationships: one-to-one, many-to-one (M:1), and many-to many (M:N)[2] (both 1:1 and M:1 can be considered special cases of M:N, and 1:1 a special case of M:1). All are supported by the RDM with referential (FK) constraints[2,3].

Relationships are "invisible" only if it is not understood that they are what constraints on and among relations represent (i.e., absent foundation knowledge). They are further obscured by SQL implementations that fail to support true relations and all relational constraints (but are wrongly believed to be RDBMSs). This will remain so, and become worse as long as data and relational fundamentals are ignored, misused and abused, rendering database management unsound[11].



Follow: @DBDEBUNK @THEPOSTWEST


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Conceptual Modeling for Database Design: A New Perspective, Part 1 and 2, forthcoming.

[3] Pascal, F., Logical Database Design: Formalizing the Informal, forthcoming.

[4] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

[5] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[6] Kent, W., DATA AND REALITY (recommended classic).

[7] The ORM Foundation.

[8] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

[9] Pascal, F., Multidimensional Relations, Flat Tables and Logical-Physical Confusion.

[10] Pascal, F., What Relations Really Are and Why They Are Important.

[11] Kerstiens, C., It's the future (for databases)
https://www.citusdata.com/blog/2018/05/08/its-the-future-for-databases/


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment