From: MJ
To: Editor
On page 368 of Date's INTRODUCTION TO DATABASE SYSTEMS
7th Ed ., demonstrating BCNF (Boyce-Codd Normal Form), the table
SSP{S#, SNAME, P#, QTY} has candidate keys {S#,P#} and {SNAME, P#) and additional
determinants of S# and SNAME. I can see how this does not conform to the
definition of BCNF, but what about 2NF? The primary key could be {S#,P#}, in
which case SNAME is only partially functionally dependent upon the primary key,
similarly for {SNAME, P#} and S#. If the rules for decomposing that relation so
that it conforms to 2NF are followed, you will not be in a position where BCNF
comes into play.
This question arose because I am trying to see how the
relations ST and TJ (on page 370) are derived. Obviously T, as the
non-candidate key which is a determinant (and its dependent) make the TJ table,
and obviously you could only choose ST (because T->J and J-/->T), but I
can't work out what the general rule is that, when followed, would produce
these tables (based on the candidate keys and determinants that you have).
Do you have any further notes or pointers on this that would
help me? (I'm a graphics researcher, who teaches databases).
Chris Date Responds: MJ has some questions regarding
normalization. (By the way, how nice to be able to deal with a scientifically
respectable subject for a change.) The first question has to do with an example
from my book AN
INTRODUCTION TO DATABASE SYSTEMS, page
368. We're given a relvar SSP as follows:
SSP {S#,SNAME,P#,QTY}
CANDIDATE KEY {S#,P#}
CANDIDATE KEY {SNAME,P#}
satisfying the additional functional dependencies (FDs)
{S#} ® {SNAME}
{SNAME} ® {S#}
In my book, I claim, correctly, that relvar SSP is not in
BCNF. MJ observes, however, that it's not even in 2NF (and hence a fortiori not
in 3NF or BCNF, either), because if we choose--arbitrarily--{S#,P#} as the
primary key, then {SNAME}, though necessarily functionally dependent on that
primary key, isn't irreducibly so.
I fear this misconception on MJ's part is partly my fault. In
the section of my book dealing with 2NF (also with 1NF and 3NF), I deliberately
indulged in some "creative lying"; that is, I deliberately simplified
some of the material for pedagogic reasons. In my own defense, however, let me
note that I did call this fact out quite explicitly in the book! The very first
paragraph of the section in question reads as follows:
Caveat: Throughout this section, we assume for
simplicity that each relvar has exactly one candidate key, which we further
assume is the primary key. These assumptions are reflected in our definitions,
which (we repeat) are not very rigorous. The case of a relvar having more than
one candidate key is discussed in [a later section].
Here then is the "not very rigorous" definition of
2NF from that same section:
Second normal form (definition assuming only one
candidate key, which we assume is the primary key): A relvar is in 2NF if
and only if it is in 1NF and every nonkey attribute is irreducibly dependent on
the primary key.
Note the explicit simplifying qualifications in this
definition!
Here by contrast is the original (precise) definition
of 2NF, essentially as given by Ted Codd back in 1971:
A relvar R is in second normal form if it's in first
normal form and every nonprime attribute is fully dependent on each candidate
key of R.
In order to understand this definition, of course, you need
to know what a "nonprime attribute" is. Well, a prime
attribute is an attribute that participates in at least one candidate key --
not necessarily the primary key, despite the "prime" terminology!--of
the relvar in question. And a nonprime attribute is, of course, one that isn't
prime.
Going back to the SSP example, then, the problem with that
relvar is that attributes S# and SNAME are both prime attributes. Relvar SSP is
thus in 2NF (and also 3NF, in fact) by Codd's original definition, even though
it looks like it's not in 2NF (or 3NF) according to my simplified
definition.
MJ's second question has to do with an example on page 370 of
the same book. We're given a relvar SJT as follows--
SJT {S,J,T}
CANDIDATE KEY {S,J}
satisfying the additional FD
{T} ® {J}
In my book, I state that this relvar can -- not
necessarily should, please observe!--be decomposed into the BCNF relvars
ST {S,T}
CANDIDATE KEY {S,T}
TJ {T,J}
CANDIDATE KEY {T}
MJ writes: "I can't work out what the general rule is
that when followed would produce these [relvars] (based on the candidate keys
and [FDs] that [we've been given])." Actually the rule is straightforward;
it's given in my book on page 384. I'll just state it here without further
explanation, except to say that (a) X and Y must be understood to
be sets of attributes and (b) "dependency" must be understood
to mean an FD specifically:
[The] following algorithm (Steps 0 to 3) is guaranteed to
produce a decomposition D of [any given relvar] R into BCNF
relvars that is nonloss [though] not necessarily dependency-preserving:
1.
Initialize D to contain just R.
2.
For each nonBCNF relvar T in D, execute Steps 2
and 3.
3.
Let X ® Y be an FD for T that violates
the requirements for BCNF.
4.
Replace T in D by two of its projections, viz.
that over X and Y and that over all attributes except those in Y.
Finally, MJ writes: "Do you have any further notes or
pointers ... that would help me?" Well, yes, I do have a couple. First, if
you'd like a more thorough, formal, and rigorous treatment of this whole area
of functional (and other) dependencies, then I suggest you try Chapter 7 in
Jeffrey D. Ullman, PRINCIPLES OF DATABASE AND KNOWLEDGE-BASED SYSTEMS, VOL.
1, Computer Science Press (1988).
Second, if you'd like a more reader-friendly discussion of
all of the normal forms up to and including BCNF, as well as of everything else
Ted Codd invented in his wonderful series of papers from 1969 to 1979, then let
me immodestly call your attention to a brief recent book of my own (152 pages)
entitled THE
DATABASE RELATIONAL MODEL: A RETROSPECTIVE REVIEW AND ANALYSIS. Here's the puff piece I wrote for the back cover of
that book at the time it first appeared:
E. F. Codd's relational model, first presented to the world
in a startlingly novel series of research papers over the period 1969-1979, was
a revolution at the time, albeit one that was desperately needed. Some thirty
or so years later, however, it seems that the database community in general has
come to regard the relational model as somehow passé and no longer
relevant -- even though the entire database industry is in fact founded on that
model. Certainly it is easy to find databases and database applications (not to
mention DBMS products) in which relational precepts have been overlooked or
violated, usually with very negative consequences. Two factors that might help
to explain this state of affairs are as follows: (a) Many of Codd's original
papers have since become hard to find; (b) with all due respect to their
author, some of them are not easy to read! The purpose of the present book is
thus:
·
To revisit Codd's original papers;
·
To explain their contribution and assess their
significance;
·
And, most important, to restate and reinforce their
message for a new generation of database professionals.
Posted
05/09/02
[ABOUT]
[QUOTES]
[LINKS]