Q1: "I have a database for a school that has been in use now for a couple of years and it is working well. There are numerous tables obviously but consider these:Some quibbles:
Considering that the CONTACT table has fields such as: LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL ... Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you de-normalize for the sake of being more practical? My first reaction is: if it works well, why not leave it alone? But OK." --comp.databases.theory
- CONTACT - all contacts, students, faculty, or any other type of contact (probably should have called it Entity)
- FACULTY - info about specific faculty member, foreign key to CONTACT
- STUDENT - info about specific student, foreign key to CONTACT
- Attributes, not “fields”;
- Information specific to faculty and students, not “info about specific faculty or student’;
- "Numerous tables" (should be relations) that are unspecified are a concern;
Clearly there are several types of entities that share some attributes, while some attributes are unique or, in other words, entity supertypes and subtypes (ESS). Logical design without a well defined conceptual model is a fool's errand. Yet practically none of the replies focus on this fundamental problem.
Note: I have replaced references to table/column/row with relations/attributes/tuples in square brackets.
A1: "What if Professor of Mathematics also takes courses in Digital Photography? Are you going to store her phone number in 2 [tuples], both in Faculty and Student [relations]? What if a postdoc student also has a part time job teaching? Depending on the platform, you might find materialized query [relations] aka materialized views, and/or index covering to be good alternatives to denormalization. I'm speaking about DB2 and Oracle."+ Notices an underspecified business model;
- Contaminates logical design with implementation considerations--denormalization, materialized relations, specific DBMS's (one must have a fully normalized design to denormalize).
A2: "My question is - why do you need to denormalize this? Are there heavy reports being run on this data that don't execute fast enough? What kind of reports? What other info is in faculty and student [relations] besides foreign keys?"+ Asks about attributes;
- Same focus on denormalization.
Q2:"What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty [relations] then you could archive them easily. Have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact [tuples]? 2 student [tuples]? Or do you have just one of each and have a change log. Which is best?Now he adds some of what's missing, but still not sufficient. An excellent validation of my strong recommendation to refrain from database design advice without sufficient knowledge. Q2 reveals new requirements and objectives that have almost nothing to do with the original Q1. No doubt, more will emerge with further prompts (see below).
The student [relation], besides having a foreign key to the CONTACT would have also the following: medical comments, current grade, foreign key to account which is just another foreign key to contact [relation] for the parent/billing [tuple]. For the faculty [relation], additional [attributes] would include: hire info, login/password for electronic timesheet login."
A3: "One table name PERSON. Anything else is a denormalized nightmare."- This does not merit comment. Incidentally, it would be the one-table design that would be denormalized and a nightmare, not "anything else" (complete with NULLs for both inapplicable and missing data in SQL).
A4: "Do you need to track all and every change to the tables? Then I would go with logs. If no - are your needs restricted to tracking address changes?"- Ditto.
Q3: "There are no real requirements for this database. They are simply happy to have whatever is provided. It would be easy to keep a log of changes; however, I am unsure of a more fundamental issue. I know this next statement is riddled with problems but, just for fun, consider this: You track changes yes, but at the end of each day or week or whatever you compare the current record with the last snapshot of the record which would simple be an xml based representation of the record. If the current record differs from the snapshot then you create a new snapshot. You could have a table like:
should hold the xml snapshot of the record. I know this leaves my initial question concerning normalization somewhat, but is still related. So we have users who don't specify requirements, all the more reason to leave the database alone if it works. And a database practitioners who does not know much about the requirements, or design principles, but wants to shove XML into the bargain (this gives insight into the fad mechanism at work)."What did I tell you?
A5: "I don't see how having separate [relations] for fundamentally different types of contacts could be considered denormalization. In fact, your current scheme of having a bunch of 1-1 relationships seems to reflect object-oriented thinking rather than relational thinking; it looks like your "faculty" and "student" [relations] and their 1-1 relationships are attempts to "subclass" your "entities".+ How to represent ESSs in a database has, indeed, nothing to do with denormalization;
IMHO, it would make more sense to have separate [relations] for each type of "entity," each containing all and only the attributes that make sense for a faculty member, student, etc. For one thing, it would make referential integrity checking easier.
As I see it, the only virtue of your current scheme is that if a faculty member happens to have a kid at the school, their contact information is stored in only one place. That's likely to represent only a few cases, and it's subject to change (what if the teacher gets divorced and the other parent gets custody of the kid? What if somebody wants to receive personal mail at one location and work- related mail at another?). Otherwise, it looks like an attempt to micro-optimize space usage."
- However, they do not involve "fundamentally different types of entities"--they share some (though not all) of the attributes;
- Which is why entity subtyping is precisely the correct modeling here--it is part of basic conceptual modeling that has nothing to do with OO per se, or the RDM.
And here is the crucial point: logical databases are formal representations of conceptual models--relations represent entity classes (including ESS's), which are defined by the properties they share. What properties are shared by what entities will determine the entity types/classes that map to relations. Which is what the questions do not fully specify and on which the responders fail to insist. Without well-specified classes of property-sharing entities--the neat three classes of Contacts, Students and Faculty established a-priori may not be the ones determined by the sharing of properties--what exactly will relations represent and on what grounds are they designed?
Do you like this post? Please link back to this article by copying one of the codes below.URL: HTML link code: BB (forum) link code: