Saturday, March 9, 2019

Fourth Order Properties Part 2: Association Relations in Database Design - An Example

Part 1 outlined fundamentals of fourth order properties (4OP) of a multigroup arising from relationships among its group members due to 1:1, M:1, M1:M2, and, generally, M1:M2:M3:...:Mn relationships among the groups' entity members. Fundamentals are commonly missing from database practice, as reflected in the exchange:

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”
“Interesting -- I’d never heard this term before. I’ve heard it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”
It's not really introduced because it's way more towards academic than functional.”
Knowledge of the fundamentals would have obviated the question, the ad-hoc terminology, and the answers. Note in particular how -- notwithstanding the conceptual and logical nature of the question -- the first answer typically delves directly into implementation[1]. The second answer does not merit attention, except as indicator of the sad state of the industry.

Given a conceptual model, we shall now compare the database design proposed in the question with (1) conventional industry practice that includes "embedded foreign keys", and (2) assuming a true RDBMS, the unified representation using association relations of not just M1:M2, or, generally, M1:M2:M3:...:Mn 4OP relationships, but also the 1:1 and M:1 special cases[2].



Up to 2018, DBDebunk maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 the website was has been discontinued. The content is not available anywhere else, and if you deem it useful, particularly if you are a regular reader, please ensure its continuation and free availability by supporting it with as much as you can afford via purchases of publications, or donations. Thank you.


  • Updated the LINKS page: online logic courses.



  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work.


I deleted my Facebook account. You can follow me:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • REVISED! @ThePostWest on Twitter: Evidence for #AntiZionism as component of the spike in #Antisemitism -- the only universally acceptable hatred -- as the (traditional) response to the existential crisis of decadence and decline of Western (including the US) civilization (for the classic component, see ThePostWest blog).
  • REVISED! @The PostWest blog: Evidence for the classic component of the spike in #Antisemitism -- the only universally acceptable hatred -- as the (traditional) response to the existential crisis of decadence and decline of Western (including the US) civilization (for the classic component, see @ThePostWest on Twitter).

According to the link in the question, the conceptual model identifies three types of entities -- a multi-floor building, rooms, and beds:
  • Multiple rooms per floor (M:1);
  • One or more beds per room (M:1), with "single" rooms as a 1:1 special case.

for which the following database design is proposed:

This follows the conventional industry practice of "embedding" the PK of a ("parent") relation -- representing a group of strong entities -- as a FK into the ("child") relations representing the related groups of weak entities. However,
  • With a single building, the BUILDING relation makes no sense, let alone embedding the BLDID SK in ROOMS or BEDS -- how to represent building properties in this case requires a more complex conceptual model and possibly multiple relations, which is beyond the scope of this discussion (see the multi-building case below);
  • BEDS does not have a PK: it looks like it is assumed that if there are multiple beds in a room, they have distinct sizes and, thus, (ROOM#,SIZE) can serve as composite PK. But SIZE represents a property, not a name, and cannot serve as a PK component[3].

So, the proper conventional design is:

where BEDID is a SK, and ROOM# is an embedded FK.

Now, to recall from Part 1, with the elimination of the embedding practice, the old principle of “(1) only "weak entities" should be modeled with FKs, and then (2) ONLY if both the "parent" and the "child" (1 and M) entities have natural keys (NK), and (3) the NK of the parent is embedded in the NK of the child” -- has been superseded by the principle of uniform representation by association relations -- provided a true RDBMS supports the proper physical implementation and manages them transparently to users and applications! Under the new principle, the design is:

For a hotel with  multiple buildings and a hotel-wide room numbering scheme it would be:
where BLDID is SK, and BLDG-ROOMS is an association relation.

Note very carefully that existence dependencies of weak on strong entities are enforced by the two referential constraints, one on BUILDINGS and BLDG-ROOMS, and one on ROOMS and ROOM-BEDS -- that is precisely their function: together with all other relation and database constraints, they guarantee semantic consistency (i.e., that the data is consistent with their meaning -- dependencies included -- as assigned by the modeler)[4].

The general M1:M2:M3:...:Mn case is left as an exercise for the reader.

There are still FKs, but they are included in association relations, not embedded in "child/weak" group relations. Why does this make more sense? (hint: think what an inter-group relationship due to (M1:M2:M3:...:Mn) member relationships corresponds to in a relational representation?)

Before you rush to object "too many relations/joins", remember: the principle is conditional on a true RDBMS capable of proper physical implementation and transparent management. And consider two major advantages:

  • Design robustness to changes:
- if rooms have a "bed capacity" (either fixed for fixed bed size, or computed according bed size), and the relationship between beds and rooms changes from 1:1 to M:1, or even M1:M2 (e.g., as in rooms being combined into suites via adjoining rooms with doors between them) (only the constraint may change, if necessary);
- if beds get moved between rooms;
- if room numbering changes (e.g., to create a "members only" or "non-smoking" building", while keeping the room pricing and descriptions the same);
- if some rooms have beds and others do not (e.g., meeting rooms);
  • Elimination of the anomaly (and atrocity) of self-reference of the embedded FK design (the common example is the EMPLOYEES relation that includes the EMP# number of the manager -- there must be at least one tuple for which MGR# is inapplicable).
“In short, logical designs using association relations are more robust as 4OP relationships evolve, make their representation more symmetrical (the parent/child distinction is purely semantic, so we can enter the relationship from either "side"), and easier to alter to add properties (not possible with embedded FKs), avoid inapplicable nulls, and the corresponding physical design can be better optimized (more implementation options).
--David McGoveran
Now, try to achieve this with SQL DBMSs!

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

[2] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[3] Pascal, F., The Key to Relational Keys: A New Perspective.

[4] Pascal, F., Data and Meaning Parts 1-3.

No comments:

Post a Comment

View My Stats