Sunday, May 13, 2018

Relationships and the RDM Part 1: Kinds of Relationships


Note: This is a multi-part rewrite of two earlier posts, to bring them into line with McGoveran's formalization and interpretation of Codd's true RDM[1]. For the in-depth treatment see [2,3].

Whenever I demonstrate that data practitioners do not understand data and relational fundamentals, it is not the ample evidence that I present, but my claim that is dismissed as nonsense: how could everybody be wrong and only Fabian Pascal be right?

I dare you, then, to make sense of the following:

“William Kent confesses (in my words) that he can not distinguish between "relationships" and "attributes" ... the later might be completely redundant ... the notion of an attribute presumes a relationship, so we must define that first ... All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes ... an attribute is an object playing a role in a relationship with another object.”
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ...”
“... isn't it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together) -- should a table (relation) consists only of foreign keys? ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys).”
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships.”
“In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?).”
“I argue that there is essentially no difference between relationships between entity (type tables) and between an entity and its attributes. They both represent relationships between two populations of things. Something is an attribute by virtue of there being a relationship. If relationships are represented by foreign keys and the entity tables must be in 1NF, as in the relational model, then all relationships must be at most Many-to-One (a very unnecessary limitation when modeling some user domain).”
“I argue that there is essentially no difference between relationships between entities of distinct classes and between properties of the same class. They both represent relationships. A property can represent a relationship between entities of distinct classes. If such relationships are represented by foreign keys and the relations representing the classes must be in 1NF, then relational databases can represent only M:1 relationships, a very unnecessary limitation when modeling some reality of interest.”
“The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled).”
There is so much wrong squeezed in these paragraphs that, as we shall see, it takes a several fold longer, multi-part series to debunk, which is really impossible without foundation knowledge. So let's have some first.

------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE


I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you.



NEW PUBLICATIONS




HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
@THE POSTWEST

  •  The Dystopia of Western Decadence, the Only Acceptable Racism, and the Myth of a “Palestinian nation”
------------------------------------------------------------------------------------------------------------------ 

Relationships and Conventional Wisdom


The notion that the RDM supports only one type of relationships (if any) -- those "between tables" -- is rooted in a combination of the conventional approach to conceptual modeling -- the "entity-relationship model" (E/RM)[4] -- the common misconception that "relational" comes from "related tables" (rather than mathematical relations) and SQL: (1) there are entities of different types and relationships among them (2) entity types are represented by "tables" in the RDM, therefore (3) the RDM supports only relationships among tables.

Relationships are an informal conceptual element. Codd's real RDM (distinct from SQL implementations, with which it is confused[5]) reserves a different term for their formal database representation[6,7] (which?). But in the absence of foundation knowledge, many practitioners are simply not aware of the correspondence to relationships, and what those relationship are.

In our approach to modeling inspired by McGoveran's study of Codd's RDM, the objects of interest are not just  entities, but also collections thereof -- entity groups, subgroups, and multi-groups -- and, significantly, there are relationships:

  • Among entity properties;
  • Among group members;
  • Among group and subgroup members;
  • Among groups.
that have been obscured by flawed industry practices, but are all supported by the RDM.

How do we recognize a desk? There is a set of required properties shared by all desks -- the defining properties of desks -- from the observed co-occurence of which we infer the desk's existence. Some properties are primitive (i.e., not further analyzable) and individual (i.e., to determine whether the property exists, it is sufficient to examine a single entity, independent of all others).

An entity type is, thus, a named set of defining properties, some of which are primitive individual properties to which we shall refer as first order properties (1OP). All entities of the type share at least the set of defining 1OPs.

For example, the set of FICO score, Balance and State required 1OPs comprise an entity type named customer; every customer is a set of values of these 1OPs, each having a last name, first name as an externally assigned name, and a customerID as an internally assigned name (for the difference between properties and names, see[2]).


Property Relationships


Properties can be derived from other properties. One type of derived properties are relationships among two or more 1OPs, which we shall refer to as second order properties (2OP). Like 1OPs, 2OPs are individual properties: to determine if one holds for a specific entity, only that entity must be examined, independent of all others (although the relationship must hold for every entity of the type).

For example, limiting balance to a maximum as a function of the FICO score, imposes a relationship between the two 1OPs. 


Member Relationships


An entity group is a collection of entities -- its members -- that (1) share the same 1OPs (and 2OPs, if any) (i.e., are of the same type) and (2) may have, jointly as a group, required collective properties that arise from relationships among group members[2]. We shall refer to these relationships as third order properties (3OP).

There are at least three types of member relationships that give rise to 3OPs[2].

Uniqueness: Group members are distinguishable by definition -- otherwise we wouldn't be able to tell them apart. This means that they are unique. Uniqueness is a required relationship among all members -- every one is distinct from all the others -- and, therefore, a defining property of groups. It is a collective property: to determine whether the relationship holds, every member must be compared to all other members.

Note: Distinguishability means that there is at least one subset of defining 1OPs that identifies (i.e., has a unique value for) every member -- we shall refer to it as the "identifying 1OP subset". Often, unique names are assigned to members that serve as shorthands for the values of the subset, such that there is a 1:1 relationship between every name and the combined value of the property subset (if there are no names, they must be generated for database representation[8]).

For example, because customers are entities of the same type, for each customerID there is exactly one value of balance, FICO score and state.

Name-properties relationships: Because group members are entities of the same type, for every member's assigned name there is exactly one value of every 1OP (including those not in the identifying property subset). A name-properties relationship (for lack of a better name) among all members is similar to uniqueness:
to determine whether it holds,  every member must be compared to all others.

For example, for every customerID there are single balance, FICO score and state values.

Aggregate relationships: An aggregate relationship is a requirement placed on an aggregate of one or more properties of group members. It is a collective 3OP: to determine if the relationship holds, all group members must be examined.

For example, a limit on total customer balances is a required relationship among all customers.

Required 1OPs, 2OPs and 3OPs are a group's defining properties and serve as the group's membership criterion that members must satisfy.


Entity Supertype-subtypes Relationships


Defining properties are required of all group members. A meaning criterion is an optional property -- only some members have it. It partitions an entity group into subgroups, one that has the property and one that has not. A group that is not so partitionable is primitive.

The defining properties of a subgroup are the group defining properties together with the meaning criterion.

An entity supertype-subtypes (ESS) relationship exists among group and subgroup members[9]: subgroup members are a subtype of the group members.

For example, some, but not all customers are subject to sales tax.


Group Relationships


Relationships may also exist among groups, in which case the groups form a multi-group. If required, intergroup relationships are defining properties of the multi-group, to which we shall refer as fourth order properties (4OP). 4OPs are properties of the multi-group: to determine that the relationships hold, multiple groups must be examined.

Depending on the number of members of each group involved, relationships that give rise to 4OPs can be:

  • One-to-one (1:1)
  • Many-to-one (M:1)
  • Many-to-many (M:N)

For example, if there is an orders entity group with defining properties order date, customerID, and order amount and assigned name Invoice#, and a limit is placed on the order amount as a function of balance owed, there is a required 1:1 relationship between customers and orders groups.

Note: M:1 and M:N relationships are left for the reader's imagination.

 Because they are informal, conceptual models are not "computable". The objects -- entities and collections thereof -- and the BRs specifying the defining 1OPs, 2OPs, 3OPs, MCs, and 4OPs (their formulation is discussed in [2]) -- must be formalized symbolically using a formal data model (e.g., the RDM) for database representation. This is the purview of logical database design, and the RDM (distinct from SQL DBMSs pretending to be relational) supports all the relationships, as demonstrated in [3]. 

(Continued in Part 2)


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Conceptual Modeling for Database Design: A New Perspective, forthcoming.

[3] Pascal, F., Logical Database Design: Formalizing the Informal, forthcoming.

[4] Pascal, F., Data Model: The RDM Is, the ERM Isn't.

[5] Pascal, F., To Really Understand Integrity, Don't Start with SQL.

[6] Pascal, F., Integrity Is Not Only Referential DBMS vs Application Enforced Constraints.

[7] Pascal, F., What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency.

[8] Pascal, F., The Key to Relational Keys: A New Understanding.

[9] Pascal, F., Meaning Criteria and Entity Supertype-Subtypes.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

2 comments:

  1. Do we have to pay to see Part 2? and this is just a teaser?

    ReplyDelete
  2. No, this is a three-part free post. However, posts ARE just teasers of my papers and books that are in-depth treatments and if you really want to understand the subject they are strongly recommended.

    ReplyDelete