Sunday, May 13, 2018

Relationships and the RDM Part 1: Kinds of Relationships




Note: This is a multi-part rewrite of two earlier posts, to bring them into line with McGoveran's formalization and interpretation of Codd's true RDM[1]. For the in-depth treatment see [2,3].


Revised 5/28/18

Whenever I demonstrate that data practitioners do not understand data and relational fundamentals, it is not the ample evidence that I present, but my claim that is dismissed as nonsense: how could everybody be wrong and only Fabian Pascal be right?

I dare you, then, to make sense of the following:

“William Kent confesses (in my words) that he can not distinguish between "relationships" and "attributes" ... the later might be completely redundant ... the notion of an attribute presumes a relationship, so we must define that first ... All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes ... an attribute is an object playing a role in a relationship with another object.”
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ...”
“... isn't it funny, that the term relation is implicitly mapped (in our minds) to a table of a database? If (loosely speaking) a relationship in our conventional data modeling is represented by a foreign key in a table (and combining both points together) -- should a table (relation) consists only of foreign keys? ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys).”
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships.”
“In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?).”
“I argue that there is essentially no difference between relationships between entity (type tables) and between an entity and its attributes. They both represent relationships between two populations of things. Something is an attribute by virtue of there being a relationship. If relationships are represented by foreign keys and the entity tables must be in 1NF, as in the relational model, then all relationships must be at most Many-to-One (a very unnecessary limitation when modeling some user domain).”
“I argue that there is essentially no difference between relationships between entities of distinct classes and between properties of the same class. They both represent relationships. A property can represent a relationship between entities of distinct classes. If such relationships are represented by foreign keys and the relations representing the classes must be in 1NF, then relational databases can represent only M:1 relationships, a very unnecessary limitation when modeling some reality of interest.”
“The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled).”
There is so much wrong squeezed in these paragraphs that, as we shall see, it takes a several fold longer, multi-part series to debunk, which is really impossible without foundation knowledge. So let's have some first.

------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you.

NEW PUBLICATIONS

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 

NEW: SOCIAL MEDIA



The social media pages are now as follows. Please make a note of it and disseminate.
Twitter:


Facebook:



The DBDebunk pages will contain links to updates to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.

The PostWest pages will contain links to evidence for, and my take on Evidence on Dystopian Western Decadence, The Only Acceptable Racism Left, and The Weaponized Myth of a "Palestinian Nation".
------------------------------------------------------------------------------------------------------------------

Relationships and Conventional Wisdom


The misconception, among many others[4], that the RDM supports only one type of relationship "among tables" (if any) is rooted in the conventional "entity-relationship modeling" (E/RM)[5], the common fallacy that "relational" comes from "related tables" (rather than mathematical relations), and SQL: (1) there are entities of different types and relationships among them (2) entity types are represented by "tables" in the RDM, therefore (3) the RDM supports only relationships among tables.

Relationships are an informal conceptual element[2]. The RDM (distinct from SQL, with which it is being confused) uses a different term (which?) for their formal logical database representation[6,7]. But in the absence of foundation knowledge, many practitioners are unaware of its correspondence to relationships, and don't know what those relationships are.

In our approach to modeling, inspired by McGoveran's formalization and re-interpretation of Codd's true RDM that includes corrections, refinements and extensions of his own[1], objects of interest are not just individual entities, but also collections thereof -- entity groups, subgroups, and multi-groups -- and, significantly, there are relationships among:

  • Properties and entities;
  • Entity properties;
  • Entity members of the same group;
  • Entity members of a group and subgroup thereof;
  • Entity members of different groups.
that are all supported by the RDM, but unknown to practitioners due to lack of foundation knowledge, flawed industry practices, and poor DBMS implementations. While in the E/RM properties and relationships are distinct concepts, in our approach relationships among members of the same group and of different groups are collective properties of the group, and the multigroup, respectively. 


Objects and Properties


How do we recognize objects of a specific type, say, a desk? There is a set of required properties shared by all objects of the type -- their defining properties -- and when we observe a co-occurence of those properties  we infer an object of that type (a specific desk).

In other words, an object type is a named set of defining properties, the name being a shorthand for the set, and objects of the type are named sets of values of the defining properties.


Properties-Entity and Property Relationships


The simplest type of object is the entity -- a named set of individual primitive properties. Primitive means not composed of other properties. For example, co-occuring FICO Score, Balance, State, Gender, and Tax are the set of defining properties of entities of type customer.

Properties describe entities. Names are often assigned to entities as shorthands for the set of defining properties. Unlike properties, names do not describe or define, but only identify (denote, reference, point to) entities. For example, (first names, last names) and customerIDs are names externally and internally, respectively, assigned to customers[8].

There exists a relationship between every defining property and an entity of a type, another way of saying that the entity has the property (i.e., the existence of a property depends solely on a single entity). We shall refer to these properties as first order properties (1OP), and to the relationship of all the defining 1OPs with each entity of a type as a properties-entity relationship.

Note: As we shall see, the formal representation of this relationship at the logical database level is referred to as a general dependency.

A relationship among one or more 1OPs is itself an individual entity property we shall refer to as second order property (2OP). For example, if the enterprise limits Balance as a function of the FICO score, it imposes a relationship on the two 1OPs that is an individual 2OP of customers.


Relationships Among Members of a Group


An entity group is a collection of distinguishable entities -- its members -- that (1) share the same 1OPs (and 2OPs, if any) (i.e., are of the same type) and (2) may have, jointly as a group, collective properties that arise from relationships among group members[2]. We shall refer to these as third order properties (3OP). Think of entities that share 1OPs and 2OPs as "potential" group members, that can be actual members only if, together as a group, they also have the required 3OPs.

There are at least two types of member relationships that give rise to 3OPs[2].

Uniqueness: Group members are distinguishable by definition -- otherwise we wouldn't be able to tell them apart -- which means they are unique. Uniqueness is a required relationship among all members -- every one is distinct from all the others -- and, therefore, a collective defining property of an entity group. If, as members of a group, entities of the type are assigned names, there must be a 1:1 relationship between the names and the values of the identifying subset.

For example, customerIDs are unique names assigned to customers, which are  assumed to have a 1:1 relationship with the values of some subset of identifying 1OPs of customers.

Note: Because a group (by definition) has as members entities of a single type, the properties-entity and uniqueness relationships together extend the 1:1 relationships to the 1OPs not in the identifying subset. Consequently, for every name there is exactly one value of each defining 1OP of an entity member. If customerIDs are properly assigned, then for every customerID there will be exactly one value of Balance, FICO score, State, Gender, and Tax. As we shall see, the formal representation of this relationship at the logical database level is referred to as a functional dependency.

Aggregate relationships: An aggregate relationship is a requirement placed on an aggregate of one or more properties of group members. It is a collective 3OP: to determine if the relationship holds, all group members must be examined.

For example, a limit on total customer balances is a required relationship among all customers.

Required 1OPs, 2OPs and 3OPs are a group's defining properties and serve as the group's membership criterion that members must satisfy.


Entity Supertype-subtypes Relationships


Defining properties are required of all group members. A meaning criterion is an optional property -- only some members have it. It partitions an entity group into subgroups, one that has the property and one that has not. A group that is not so partitionable is primitive.

The defining properties of a subgroup are the group defining properties together with the meaning criterion.

An entity supertype-subtypes (ESS) relationship exists among group and subgroup members[9]: subgroup members are a subtype of the group members.

For example, some, but not all customers are subject to sales tax.


Relationships Among Members of Different Groups


Relationships may also exist among groups, in which case the groups form a multi-group. If required, intergroup relationships are defining properties of the multi-group, to which we shall refer as fourth order properties (4OP). 


Depending on the number of members of each group involved, relationships that give rise to 4OPs are:

  • One-to-one (1:1)
  • Many-to-one (M:1)
  • Many-to-many (M:N)

where 1:1 is a special case of M:1, which is a special case of M:N. For example, if there is an orders entity group with defining properties Order date, and Order amount and assigned name Invoice#, and a limit is placed on Order amount as a function of Balance owed, there is a  1:1 relationship between customers and orders groups.

Note: M:1 and M:N relationships are left for the reader's imagination.


Because they are informal, conceptual models are not "computable". The objects -- entities and collections thereof -- and the BRs specifying the defining 1OPs, 2OPs, 3OPs, MCs, and 4OPs (their formulation is discussed in [2]) -- must be formalized symbolically using a formal data model (e.g., the RDM) for database representation. This is the purview of logical database design, and the RDM (distinct from SQL DBMSs pretending to be relational) supports all the relationships, as will be demonstrated in [3]. 

(Continued in Part 2


 
Follow: @DBDEBUNK @THEPOSTWEST

 

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Conceptual Modeling for Database Design: A New Perspective, forthcoming.

[3] Pascal, F., Logical Database Design: Formalizing the Informal, forthcoming.

[4] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

 


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


2 comments:

  1. Do we have to pay to see Part 2? and this is just a teaser?

    ReplyDelete
  2. No, this is a three-part free post. However, posts ARE just teasers of my papers and books that are in-depth treatments and if you really want to understand the subject they are strongly recommended.

    ReplyDelete