- people, a supertype class of entities
- live people, a subtype class of people
- dead people, a subtype class of live people
WB proposed the following database design, consisting of two SQL base tables:
PEOPLE {PERSON_ID, NAME, BIRTH}
DEATHS {PERSON_ID, DEATH}
Note, first, that PEOPLE does not represent the entity supertype class as defined in the business model--it includes the birth date; and second, DEATHS does not represent either of the two entity subtype classes (WB did a little naming trick to identify deaths as entities, which is not always possible; these are usually "orphan" attributes that distinguish between the two subtypes).
It is, however, possible to derive from the base tables R-tables representing the subtype classes as views:
LIVE_PEOPLE {PERSON_ID, NAME, BIRTH}WB provided the SQL statement for the first view definition, the second is left as an exercise for the reader.
DEAD_PEOPLE {PERSON_ID, NAME, BIRTH, DEATH}
There is nothing to prevent us from designing the database the other way around i.e. represent the two subtype classes as base tables and derive the supertype R-table from them as a view
PEOPLE {PERSON_ID, NAME}(left as exercise for the reader).
Assuming that there is no missing data, both designs avoid inapplicable NULLs and are fully normalized. A TRDBMS would not discriminate between R-tables whose data is stored and those whose data is derived, so you would be free to choose either. What is more, a well implemented a TRDBMS would also provide better physical independence and more flexible access and storage options for the data. In order to avoid "orphan" attributes, I would probably opt for the second design.
But in PRACTICAL ISSUES IN DATABASE MANAGEMENT I advocate the first, WB's design. Why? Well, we don't live in a world of well implemented TRDBMSs.
With the first design, the relationship between people and deaths is represented by the business rule:
'A death must be of a live person (of record)'.
represented in the database by a referential constraint (foreign key). With the second design, the relationship between live people and dead people is represented by a different, mutual exclusivity business rule:
'No person can be both dead and alive'.
represented by a disjunctive constraint, that disallows a row from being inserted in both tables.
Both integrity constraints would be declarable to and enforced by a TRDBMS supporting constraints of arbitrary complexity. And, in fact, its relational deficiencies notwithstanding, since the 80's the ANSI SQL standard has supported ASSERTIONs, a general syntax for declaring arbitrary constraints. The referential constraint can be expressed in ANSI SQL as
CREATE ASSERTION deaths_persons_fkand the disjunctive constraint as two symmetric components
AS CHECK (NOT EXISTS
(SELECT *
FROM deaths d
WHERE NOT EXISTS
(SELECT *
FROM people p
WHERE p.person_id = d.person_id)));
CREATE ASSERTION no_dead_aliveVery similar syntax--both the referential and disjunctive constraints apply subqueries and a join--if a tad procedural (see below).
AS CHECK (NOT EXISTS
(SELECT *
FROM live_people l
WHERE EXISTS
(SELECT *
FROM dead_people d
WHERE d.person_id = l.person_id)));
CREATE ASSERTION no_alive_dead
AS CHECK (NOT EXISTS
(SELECT *
FROM dead_people d
WHERE EXISTS
(SELECT *
FROM live_people l
WHERE l.person_id = d.person_id)));
But SQL vendors
- have not implemented ASSERTION
- have treated the two types of constraint very differently
- seriously discriminate against views
Stay tuned for Part III.
Do you like this post? Please link back to this article by copying one of the codes below.
URL: HTML link code: BB (forum) link code:
IMO There is an identification bias. IF we identify dead and live persons wit a person_id with the same domain(actually, the same identification strategy, say using their SSN as basis for the person_id), I would like to use that domain in 1 place only and derive/constraint from there using subsets (scenario 1). If Dead people are identified differently from live people I would use scenario 2. (Example: dead people who do not keep their SSN because it might be recycled for newborns).
ReplyDeleteYou're overloading the case.
ReplyDeleteThe purpose of this article is to address some general principles, not deal with identification details. You'll understand what I mean after the 4th part.
The SIRA_PRISE formulation of the constraints is
ReplyDeleteA. PEOPLE/DEATHS design
"Deaths can only be recorded for people who have been alive."
SEMIMINUS(DEATHS,PEOPLE)
"Deaths of a person who has been alive, must have occurred after birth"
RESTRICT(JOIN(DEATHS,PEOPLE),LE(DEATH,BIRTH))
B. LIVE_PEOPLE/DEAD_PEOPLE design
"People are dead or alive, but not both"
JOIN(LIVE_PEOPLE,PROJECT(DEAD_PEOPLE,(PERSON_ID)))
"Deaths of a person who has been alive, must have occurred after birth"
LE(BIRTH,DEATH) /* tuple constraint on DEAD_PEOPLE */
All so much more neat and concise than SQL. Add the view definitions that turn design 1 into design 2, and vice versa, and you've illustrated the concept of database design equivalence in all its glory. (Hope I'm not stealing your subject matter with this :-) )
Erwin,
DeleteI had no doubts that your syntax is more concise. My quibbles:
1. Succintness is one desirable property of a language.
2. Of course, a lot depends on how efficient the implementation is (wait for parts 3,4)
3. I wish everybody would steal my subject matter the way you do.
I understand dealing with inapplicable information (live people do not have a date of death) but what about missing information (live people with no known date of birth?) That case is the major reason people tell me they need a null. (Isn't that why the 'inconsistency robustness' guy is throwing rocks?)
ReplyDeleteYou understand, but many don't and I've provided ample evidence for that. Some responsibility for that belongs to SQL, which provides poor implementations of 3VL to what is essentially 4VL.
DeleteWhat I believe is the potential 2VL solution for missing data is in my paper. The asker, in fact, applied it to inapplicable, further evidence about the confusion.
That solution would sit on top of the design issues I address in the series and the 4th part by David McGoveran will provide some info about it, which in SQL must be tackled by users because the DBMS does not support it.
I would choose WB's design as well, and would consider "live" and "dead" persons as simply questions to be answered about people. Creating a view to me would be purely optional. Great examples of the constraints - I was unaware of this standard. I would have implemented these business rules in a trigger in SQL Server as I know of no other way. Looking forward to parts III and IV!
ReplyDeleteThe important question is WHY choose that?
DeleteGiven the business model, what are the questions?
Implementing the constraints--declaratively vs. triggers -- is discussed in part 3.
Good point. I often work off gut feel - like the grammarian who writes the sentence a certain way because it "sounds right" instead of really understanding the rules behind WHY it sounds right.
DeleteSo as to why, I would initially say because it is the simplest design that still meets the requirements of the business model. The questions as to the business model were who are "live" persons and who are "dead" persons which can easily be answered implicitly by WB's resulting logical design. People inner joined to Deaths yields dead people, and People anti joined to Deaths yields live people.
So I am struggling to see your point and re-reading this post quite a lot. I'm beginning to see now something I missed. WB's design (which would have been mine) does have a hole in that a person can be "dead" many different times. The second design with the MUX rule and resulting assertions doesn't allow that. Hmmmm....
Just realized make the key to deaths just person id and voila no way a person can die over and over. But a 0 to 1 child table always seems fishy to me - like something is not quite right in the design. I have a feeling you will shed light on why I have this gut feeling in parts 3 and 4...
DeleteWhile the relational model has many aspects that are intuitive, the difference between it and the usual software engineering is that it's science and, therefore, exclusive gut feel is no longer justifiable.
DeleteHow do you define "simplest" relative to the other?
Why can't the alternative design answer those questions implicitly, just like WB's?
Multiple deaths is not the problem here, but even if it were, either of the two designs would do.
1. It is difficult to make out your designs without table meanings. Live people are not dead people. If dead people are (as entities) a subtype (subset) of live people then "live" means not alive but alive-or-dead or ever-alive. Otherwise dead people are not a subtype of live people but live and dead are (disjoint) subtypes of all. Your end-of-part-I and start-of-part-II characterizations of subtyping talk about people supertype of live supertype of dead so I guess "live" means live-or-dead. Yet you don't have any tables like that in your examples.
ReplyDelete2. Some entity type is a subtype of another entity type exactly when a foreign key constraint holds on entity-identifying attributes from subtype table to supertype table. Ie the meanings of the tables are such that if an entity is of the source type then it is of the target type. Contrary to your part I and part II characterizations, attributes don't otherwise matter. The mere sharing/subsetting/supersetting of attributes for live and dead does not make dead people a subtype of live people when live people are people that are alive (rather than alive-or-dead or ever-alive), because being dead does not imply being alive, ie no foreign key constraint holds.
I must admit that at 1st I was puzzled by the dead being subtype of alive. But this serves a useful purpose. I suggest you wait for the 4th part by David McGoveran and if you do not deem his justification persuasive we will have to agree to disagree.
ReplyDeleteI look forward to David & part IV. In anticipation...
ReplyDeleteWe can 1. define the "table-type" of a table as the set of all table values with exactly its columns or 2. as the set of all table values with at least its columns, and 3. another table as of a (proper) "table-subtype" by having additional columns 4. in the (very particular) sense that a query that works on a table of a table-type will work on a table of its table-subtypes.
Then 1. a table-subtype of a table-type that can represent alive and/or alive-or-dead people can represent dead people 2. because of having added columns. 3. But that doesn't make dead people a subtype of alive or alive-or-dead people! (4. Although dead people **aren't** a subtype of alive people and **are** a subtype of alive-or-dead people.)