Sunday, February 5, 2017

Meaning Criteria and Entity Supertype-Subtypes

Note: This is a 11/26/17 re-write of a 2012 post to bring it in line with the McGoveran formalization and interpretation [1] of Codd's RDM. For historical reasons we prefer object (though not in the OO sense) to entity, but ESS is too entrenched.

Minor revisions 12/2/17.

Here is what's wrong with last week's picture, namely:

"I have a database for a school ... [with] are numerous tables obviously but consider these:
CONTACT - all contacts (students, faculty) has fields such as LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password for electronic timesheet login, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT.
Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical?What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?" --comp.databases.theory

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.

Another example of database design without (1) a proper conceptual model and (2) foundation knowledge [2]. With "numerous unspecified relations" there isn't sufficient information to provide meaningful advice, so just a few comments.

First, correct terminology:
  • Relations, not tables (R-tables are just visualization of relations on physical media);
  • Attributes (logical), not fields (physical);
Second, if users do not have any new requirements, why would a practitioner who does not know much about existing requirements, modeling, or database design, mess with the database just because "he heard something about XML" (insert your favorite fad here)?

Normalization Is Not Be All of Database Design

But let's consider the replies.

"What if Professor of mathematics also takes courses in digital photography? Are you going to store her phone number in 2 rows, both in FACULTY and STUDENT tables? What if a postdoc student also has a part time job teaching? Depending on the platform, you might find materialized query tables aka materialized views and/or index covering to be good alternatives to denormalization. I'm speaking about DB2 and Oracle."

"Why do you need to denormalize? Are there heavy reports being run on this data that don't execute fast enough? What kind of reports? What other info is in faculty and student tables besides foreign keys?"

These correctly point to an underspecified conceptual model, but focus on normalization, which is not an issue.
"One table name PERSON. Anything else is a denormalized nightmare."
This one does not merit comment. Incidentally, it is upside down and backwards: it is the one-relation design (with NULLs representing missing data) -- that would be "a denormalized nightmare". In fact, with NULLs the table would not be a R-table (i.e., it would not visualize a relation).
"I don't see how having separate tables for fundamentally different types of contacts could be considered denormalization. In fact, your current scheme of having a bunch of 1:1 relationships seems to reflect object-oriented thinking rather than relational thinking; it looks like your FACULTY and STUDENT tables and their 1:1 relationships are attempts to 'subclass' your 'entities'. IMHO, it would make more sense to have separate tables for each type of 'entity', each containing all and only the attributes that make sense for a faculty member, student, etc. For one thing, it would make referential integrity checking easier. As I see it, the only virtue of your current scheme is that if a faculty member happens to have a kid at the school, their contact information is stored in only one place. That's likely to represent only a few cases, and it's subject to change (What if the teacher gets divorced and the other parent gets custody of the kid? What if somebody wants to receive personal mail at one location and work-related mail at another?) Otherwise, it looks like an attempt to micro-optimize space usage."
Correct about denormalization, but the answer reflects confusion of levels representattion, by lumping together conceptual, logical and physical aspects. But this is the only answer that at least engages the core issue.

ESS and Meaning Criteria

From the little information offered, it looks like there are some properties that are shared by both faculty and student objects, while some are unique to the former and some unique to the latter (i.e., there is an entity supertype-subtypes (ESS) relationship). Supertype and subtype objects are not "fundamentally different" types of entities, as they share common properties. Rather:

"Some set defining properties are formed as the disjunction of two or more properties (a kind of relationship between two common properties). These disjuncts, taken together, are meaning criteria. Each meaning criterion (an individual disjunct) induces a partitioning of a set into two subsets, those that meet the criterion and those that do not. Alternatively, we can say that each meaning criterion serves to differentiate a possible subset of a set from other subsets of the set (some of the possible subsets will be disjoint, while others not). Each of the possible subsets of the set is then defined by (“inherits”): The defining properties of the set conjoined with at least one meaning criterion (that or those becoming the defining property, or properties, respectively, specific to the proper subset)." --David McGoveran
In other words, "subtyping" is precisely how ESS reality should be modeled here, but this does not imply the recommended design [3].


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[3] Pascal, F., The Principle of Orthogonal Design Part I, II, III.

Note: I will not publish or respond to anonymous comments. If you want to say something, stand behind it. Otherwise don't bother, it'll be ignored.

No comments:

Post a Comment