Meaning Criteria and Entity Supertype-Subtypes Relationships

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of a previous post.

"I have a database for a school ... [with] numerous tables obviously, but consider these:

CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
"Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical? What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows? Or do you have just one of each and have a change log. Which is best?"

How would somebody who "does not know past, or new requirements, modeling, and database design" and messes with a working database just because "he heard something about (insert your favorite fad here)" figure out correct from bad answers? Particularly if the answers suffer from the same lack of foundation knowledge as the question?
------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you.

NEW

08/09/19: Following my series of posts on data sublanguage (Parts 1-4), I have revised for consistency the corresponding section of paper #2 in the Understanding the Real RDM series, Logical Access, Data Sublanguage, Kinds of Relations, and Database Redundancy and Consistency, which is available for ordering from the PAPERS page.

LATEST PUBLICATIONS

The Key to Relational Keys: A New Perspective.
THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book, is available to order here (Reviews: Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).
Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.
Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.

HOUSEKEEPING

To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them.

Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work.

SOCIAL MEDIA

I deleted my Facebook account. You can follow me:

@DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
@The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
@ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

“One table named PERSON. Anything else is a denormalized nightmare.”

It is hardly surprising that somebody who dismisses logical database design altogether has things upside down and backwards -- even the asker seems to know that the one relation (not table!), not separate relations -- is the "denormalized nightmare". A relation is in 5NF (i.e., fully normalized) by definition, which means informally that it represents a group of entities of a single type -- whatever a data structure that "bundles" multiple entity groups together is, it is not a relation[1,2]. So without a well-defined conceptual model (i.e., complete identification of distinct entity groups of interest), relational database design is not possible[3].

Note: Formally 5NF means all non-key attribute are functionally dependent on the primary key, and this holds only if the relation represents entities of a single type (why?).

“What if Professor of mathematics also takes courses in digital photography? Are you going to store her phone number in 2 rows, both in FACULTY and STUDENT tables? What if a postdoc student also has a part time job teaching? Depending on the platform, you might find materialized query tables aka materialized views and/or index covering to be good alternatives to denormalization. I'm speaking about DB2 and Oracle.”

The first part correctly alerts to conceptual incompleteness: the types of entities of interest have not been properly defined. Unfortunately, the second part exhibits the logical-physical confusion (LPC)[4] underlying the common "denormalization for performance" misconception[5] and the premature rush to implementation issues (represented, not stored).

“I don't see how having separate tables for fundamentally different types of contacts could be considered denormalization. In fact, your current scheme of having a bunch of 1:1 relationships seems to reflect entity-oriented thinking rather than relational thinking; it looks like your FACULTY and STUDENT tables and their 1:1 relationships are attempts to 'subclass' your 'entities'.”

Indeed, as we just explained, representing distinct types of entities by separate relations is what full normalization is about. The conceptual-logical conflation (CLC) notwithstanding[4] -- "relational thinking" is nothing but formalization at the logical level of entity group (not entity!) thinking at the conceptual level[6] -- "subclassing" alludes to the real issue here: students and faculty may not be "fundamentally different" entity types and, thus, sheer further normalization (not normalization!)[3] (i.e., straight separate relations) does not apply.

If all contacts (persons) share some common properties, but students and faculty have some unique to either, there is an entity supertype-subtypes relationship (ESS): students and faculty are entity subtypes of the contacts entity supertype.

While all members of an entity group must share required individual and collective properties[7], but some -- a subgroup -- may also optionally share some property unique to them known as as meaning criterion[8].

“Some properties are formed as the disjunction of two or more properties called meaning criteria. Each meaning criterion (individual disjunct) induces a partitioning of a group into two subgroups of entities -- those that meet the criterion and those that do not -- and serves to differentiate each subgroup from the others. Some of the subgroups will be disjoint, while others will not. Each of the subgroups is defined by (“inherits”) (1) the defining properties of the group conjoined with (2) at least one meaning criterion (that meaning criterion, or those meaning criteria, thus becoming the defining property, or properties, respectively, specific to the proper subgroup).”

--David McGoveran

In other words, because they share some, but not all properties, students and faculty are not fundamentally different groups, but subgroups of the persons supergroup (if faculty can be students or vice-versa, they are overlapping subgroups), which requires a database representation distinct from straight separate relations. The defining properties of the students and faculty subgroups are the shared properties inherited from the persons supergroup conjoined with the corresponding MCs.

I will leave it to the reader to judge whether a single table bundling all contacts is a "denormalized nightmare" or not (consider the complexity of constraint and query formulation and the consequent error-proneness). Be this as it may, such a table is not a relation and all bets are off.

In the past I advocated the following design:

CONTACTS (ID,LAST,FIRST,ADDR,CITY,STATE,ZIP,EMAIL,...)
FACULTY (ID,HIREINFO,LOGIN,...)
STUDENTS (ID,MEDICAL,GRADE,...)

But first, while CONTACTS represents the supergroup, what exactly do the other two relations represent? Not the subgroups, because faculty and students also share the properties common to all contacts, and they are absent from FACULTY and STUDENTS.

Second, McGoveran's work has established that relational database design must adhere to the Principle of Orthogonal Design (POOD)[9]: non-overlapping subgroups must be represented by disjoint relations. So if faculty and students are not overlapping, the design is:

FACULTY (ID,LAST,FIRST,...,HIREINFO,...)
STUDENTS (ID,LAST,FIRST,...,MEDICAL,GRADE,...)

and includes a disjunctive (i.e., mutual exclusivity) constraint. A UNION view of two projections on the FACULTY and STUDENTS relations would represent the supergroup.

A true RDBMS (full support of relation predicates and a theory of types) would support ESS relationships. Users would specify the supergroup and subgroups and their properties and the DBMS would figure things out for itself (i.e., create the corresponding relations, the disjunctive constraint, the CONTACTS view), and would take the ESS relationship into consideration in queries.

SQL DBMSs do not. They are unaware of the relationship -- it exists only in the mind of users. Are disjunctive constraints expressible in SQL? And recall that multi-relation views are not updatable.

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

References

[1] Pascal, F., What Relations Really Are and Why They Are Important.

[2] Pascal, F., Normalization and Further Normalization Parts 1-3.

[3] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[4] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[5] Pascal, F., "Denormalization for Performance": Don't Blame the Relational Model.

[6] Pascal, F., Reader Mail Sets vs. Graphs, Education vs. Training.

[7] Pascal, F., Relationships and the Relational Model Parts 1-3.

[8] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[9] Pascal, F., The Principle of Orthogonal Database Design, Parts 1-3.

POSTS

Sunday, August 25, 2019

Meaning Criteria and Entity Supertype-Subtypes Relationships

No comments:

Post a Comment