Saturday, March 2, 2019

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys




 “We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”
--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.


------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 the website was has been discontinued. The content is not available anywhere else, and if you deem it useful, particularly if you are a regular reader, please ensure its continuation and free availability by supporting it with as much as you can afford via purchases of publications, or donations. Thank you. 

LATEST PUBLICATIONS (order PAPERS and BOOKS)

USING THIS SITE
  • NEW! Updated the LINKS page: online logic courses.
  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me on Twitter:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • REVISED! @ThePostWest on Twitter: Evidence for #AntiZionism as component of the spike in #Antisemitism -- the only universally acceptable hatred -- as the (traditional) response to the decadence and decline existential crisis of Western (including the US) civilization (for the classic component, see ThePostWest blog).
  • REVISED! The PostWest blog: Evidence for the classic component of the spike in #Antisemitism -- the only universally acceptable hatred -- as the (traditional) response to the decadence and decline existential crisis of Western (including the US) civilization (for the classic component, see @ThePostWest on Twitter).
------------------------------------------------------------------------------------------------------------------

Conceptual Modeling


The objective of a conceptual model is to (1) identify objects -- entities, entity groups, and a multigroup -- of interest, and (2) formulate business rules (BR) that specify their properties:

  • First (1OPs) and second order properties (2OP) of entities, 2OPs arising from relationships among 1OPs;
  • Third order properties (3OP) of entity groups arising from relationships among all their entity members; and,
  • Fourth order properties (4OP) of the multigroup arising from relationships among its group members. Note that relationships -- among properties, entities, and groups -- are properties: 2OPs, 3OPs, and 4OPs, respectively.

4OPs of a multigroup can arise from:
  • Relationships among entity members of distinct groups:
- one-to-one (1:1);
- many-to-one (M:1);
- many-to-many-to-many-to... (M1:M2:M3:...:Mn); 
  • Group (i.e., aggregate) level relationships.

where 1:1 relationship are a special case of M:1 relationships, which are a special case of M1:M2 relationships between entity members of two groups, which are a special case of M1:M2:M3:...:Mn relationships among members of n groups, where n ≥ 3[7,8].

Note: What is considered an entity, or a property/relationship is a matter of abstraction at the conceptual level that requires consistency on the part of the modeler. Under the conventional Ontological Commitment to Objects (OCO), the distinction between entities and properties seems arbitrary and perhaps confusing: for example, is a relationship among entities with properties a "relationship entity"? Under the Ontological Commitment to Properties (OCP), an entity is just a name for a collection of properties -- eliminating the possible confusion, and one argument for OCP[9].


Logical Database Design (aka Data Modeling)


In the RDM, entity groups are represented formally by database relations and, as formulated by Codd:
  • A 1:1 or M:1 relationship is represented formally by the primary key (PK) of the referenced relation (i.e., on the 1 side) serving as foreign key (FK) in the referencing relation (i.e., on the M side), and a referential constraint on the two relations that guarantees FK-PK consistency[10].
  • A M1:M2:M3:...:Mn relationship among n groups [(n ≥ 2)] is represented formally by an association relation, with the PK of each relation representing a group serving as FK in the association relation, and a corresponding referential constraint on the groups (referenced) relations and the association (referencing) relation. These relationships may have properties of their own.
This has been the common understanding and practice in the industry[11]. 

But if the 1:1, M:1, relationships are special cases of the M1:M2:M3:...:Mn relationship, they should also be representable by association relations -- why are they not? As it turns out, that appears to be due to the historical context in which Codd operated. Although he had significant and enabling insights, even he was likely somewhat blinded by the old ways of thinking (and by being forced to defend RDM against them). First,
“I suspect he felt obligated to show that a relational representation does not require extra operations to traverse a hierarchy, since that was a strength of the then dominant database technology, and many thought that everything could/should be modeled as multi-hierarchies (what we would now consider simple networks can be decomposed into intersecting hierarchies, but models are much more complex today). Even though a FK-PK join would still be required, without the association relation the representation of M:1 relationships appeared *functionally* similar to the pointers of a hierarchy, which is why to this day some talk of FKs as if they were pointers.”
--David McGoveran
In other words, avoiding an association relation for 1:1 and M:1 relationships made relational representation seem more familiar to users of hierarchic systems.

Second, Codd did not appreciate the full extent of data independence idea, and the greatly increased flexibility and efficiency that physical independence (PI) affords (which continues to be missed in the industry to this day[12,13]).
“As long as an inter-group relationship does not itself have properties, the association relation that applications and end users see at the logical level need not have a 1:1 correspondence with its physically implemented "table"- or "array"-like structure -- while it must be presented to users as a relation (in the logical model), it can be implemented physically as, for example, an index (B-tree, hash, etc.). A "multitable" index will do, and is very efficient even if the relationship does have properties. While there have been products that offered such indexes, or can combine two or more different single structure indexes (less efficient) and use them when processing (logical) joins, as far as I know no one has advanced this argument (to be expounded in [14]), and I've never seen this even recommended, let alone implemented  with any discipline as part of database definition. If it were, then it would be totally transparent to users and applications.”
--David McGoveran

Otherwise put, there are neither formal, nor efficiency reasons for not representing the special cases same as the general case -- by association relations, which can be managed transparently to users and applications by a true RDBMS.

Note: The strong/weak entities issue also rose in the historical context of industry practice:


“While Codd's position was that a 1:1 or M:1 relationship can always be represented by a FK, but is mandatory for "weak entities", mine was that (1) only "weak entities" should be modeled with FKs, and then (2) ONLY if both the "parent" and the "child" (1 and M) entities have natural keys (NK), and (3) the NK of the parent is embedded in the NK of the child. If only surrogate keys (SK) (made up names) exist, there is no reason to make the "child" PK composite. For all other relationships, use an association relation. When I formulated this principle years ago, I had a couple of rare examples of the industry practice of "embedded keys", and this was a "safe" way of stating the principle that covered such cases. Without such practice, there is no need to represent 1:1 and M:1 relationships differently than M1:M2:M3:...:Mn relationships -- association relations can be uniformly used, without regard to "strong" and "weak" entities.”
--David McGoveran
Note carefully, however, that our NK definition differs from industry practice[15]. For example,
artificially created, internally assigned "private names", known and identifying entities in a limited enterprise-specific context (e.g., order and line numbers) are sometimes considered NKs because they are "used in the real world", but in our book they are SKs.

Had Codd exploited all the insights that he touched on, a lot of other problems would have been avoided. For example:
“He clearly identified a relation type with a predicate, but suggested no use of that predicate. Because a relation type is a set and every set is defined by a predicate, this led me independently to the idea of using those predicates to establish independence of base relations -- which became the Principle of Orthogonal Design (POOD)[16]. Codd's perspective on view updatability would have been very different, and led the industry to early logical independence in implementations like IBM's SQL/DS, which would have changed the definition of SQL!”
--David McGoveran
We will use this knowledge to answer the design question in -- and we recommend you try your hand before you proceed to -- Part 2.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER.

[2] Pascal, F., Industry Practice Is No Substitute for Foundation Knowledge.

[3] Pascal, F., Understanding Conceptual vs. Data Modeling Parts 1-4.

[4] Pascal, F., Database Design: What It Is and Isn't.

[5] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[6] Pascal, F., Conceptual Modeling Is Not Data Modeling.

[7] Pascal, F., Relationships and the RDM Part 1: Kinds of Relationships.

[8] Pascal, F., Data and Meaning Part 2: Types of Business Rules.

[9] Pascal, F., Understanding Conceptual vs. Data Modeling Part 4: Property-Entity Modeling.

[10] Codd, E.F., Normalized Data Base Structure: A Brief Tutorial, IBM Research Report, San Jose, California RJ935 (1971).

[11] Pascal, F., Foreign Keys Parts 1-3.

[12] Pascal, F., Physical Independence Parts 1-2.

[13] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[14] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (chapter drafts), forthcoming.

[15] Pascal, F., The Key to Relational Keys: A New Perspective.

[16] Pascal, F., The Principle of Orthogonal Database Design Parts 1-3.




No comments:

Post a Comment

View My Stats