Sunday, February 5, 2017

Meaning Criteria and Entity Supertype-Subtypes

(This is a rewrite of a 2012 post)

Here is what's wrong with last week's picture, namely:

"I have a database for a school ... [with] are numerous tables obviously but consider these:
  • CONTACT - all contacts (students, faculty) has fields such as LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL;
  • FACULTY - hire info, login/password for electronic timesheet login, foreign key to CONTACT;
  • STUDENT - medical comments, current grade, foreign key to CONTACT.
Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical?

What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?" --comp.databases.theory

This is another example of database design without (1) a proper conceptual model and (2) foundation knowledge. With "numerous unspecified relations" there isn't sufficient information to provide meaningful advice.

First, the correct terminology:
  • Relations, not tables (R-tables are just visual shorthands);
  • Attributes (logical), not fields (physical).
Second, if users do not have any new requirements, why would a practitioner who does not seem to know much about the existing requirements, modeling, or database design mess with them just because "he heard something about XML" (insert your favorite fad here)?

Let's consider the replies the question got.
A1: "What if Professor of mathematics also takes courses in digital photography? Are you going to store her phone number in 2 rows, both in FACULTY and STUDENT tables? What if a postdoc student also has a part time job teaching? Depending on the platform, you might find materialized query tables aka materialized views and/or index covering to be good alternatives to denormalization. I'm speaking about DB2 and Oracle."
A2: "Why do you need to denormalize? Are there heavy reports being run on this data that don't execute fast enough? What kind of reports? What other info is in faculty and student tables besides foreign keys?"
They correctly point out to an underspecified conceptual model , but focus on normalization.
A3: "One table name PERSON. Anything else is a denormalized nightmare."
This one does not merit comment. Incidentally, it is upside down and backwards: it is the one-table design that would be denormalized and a nightmare, not "anything else" (including NULLs for both inapplicable and missing data in SQL).
A4: "Do you need to track all and every change to the tables? Then I would go with logs. If no, are your needs restricted to tracking address changes?"
A5: "I don't see how having separate tables for fundamentally different types of contacts could be considered denormalization. In fact, your current scheme of having a bunch of 1:1 relationships seems to reflect object-oriented thinking rather than relational thinking; it looks like your FACULTY and STUDENT tables and their 1:1 relationships are attempts to 'subclass' your 'entities'.

IMHO, it would make more sense to have separate tables for each type of 'entity', each containing all and only the attributes that make sense for a faculty member, student, etc.  For one thing, it would make referential integrity checking easier.

As I see it, the only virtue of your current scheme is that if a faculty member happens to have a kid at the school, their contact information is stored in only one place. That's likely to represent only a few cases, and it's subject to change (What if the teacher gets divorced and the other parent gets custody of the kid? What if somebody wants to receive personal mail at one location and work-related mail at another?). Otherwise, it looks like an attempt to micro-optimize space usage."

Logical-physical confusion (LPC): schema is logical and independent of how data are stored--physical independence (PI) is a core practical benefit of the RDM.

Practically none of the replies engage the main issue. From the little information offered, it looks like there are some properties that are shared by all entities, while some properties are unique to only some of them i.e., there is an entity supertype-subtype (ESS) relationship. Supertype and subtype entities are not "fundamentally different types of entities", as they share common properties. Rather:

"Some set defining properties are formed as the disjunction of two or more properties (a kind of relationship between two common properties). These disjuncts, taken together, are meaning criteria. Each meaning criterion (an individual disjunct) induces a partitioning of a set into two subsets, those that meet the criterion and those that do not. Alternatively, we can say that each meaning criterion serves to differentiate a possible subset of a set from other subsets of the set (some of the possible subsets will be disjoint, while others are not). Each of the possible subsets of the set is then defined by (“inherits”):
  • the defining properties of the set
  • conjoined with at least one meaning criterion (that or those becoming the defining property, or properties, respectively, specific to the proper subset)." --David McGoveran
In other words, entity super- and sub-typing is precisely how reality should be modeled here, but this does not mean, as the reply recommends, a design with relations representing entity subtypes that contain only the unique attributes. 

For a more detailed discussion and the correct ESS database design see [1,2,3,4,5].



[2] Conceptual Business Modeling for Database Design: Entity Supertype-Subtypes

[3-5] The Principle of Orthogonal Database Design Part I, II, III

Note: I will not publish or respond to anonymous comments. If you want to say something, stand behind it. Otherwise don't bother, it'll be ignored.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment