ON NORMAL FORMS
with Chris Date

 

 

 




From: MJ
To: Editor

 

On page 368 of Date's INTRODUCTION TO DATABASE SYSTEMS 7th Ed ., demonstrating BCNF (Boyce-Codd Normal Form), the table SSP{S#, SNAME, P#, QTY} has candidate keys {S#,P#} and {SNAME, P#) and additional determinants of S# and SNAME. I can see how this does not conform to the definition of BCNF, but what about 2NF? The primary key could be {S#,P#}, in which case SNAME is only partially functionally dependent upon the primary key, similarly for {SNAME, P#} and S#. If the rules for decomposing that relation so that it conforms to 2NF are followed, you will not be in a position where BCNF comes into play.

 

This question arose because I am trying to see how the relations ST and TJ (on page 370) are derived. Obviously T, as the non-candidate key which is a determinant (and its dependent) make the TJ table, and obviously you could only choose ST (because T->J and J-/->T), but I can't work out what the general rule is that, when followed, would produce these tables (based on the candidate keys and determinants that you have).

 

Do you have any further notes or pointers on this that would help me? (I'm a graphics researcher, who teaches databases).
 
 

Chris Date Responds: MJ has some questions regarding normalization. (By the way, how nice to be able to deal with a scientifically respectable subject for a change.) The first question has to do with an example from my book AN INTRODUCTION TO DATABASE SYSTEMS, page 368. We're given a relvar SSP as follows:

 

SSP {S#,SNAME,P#,QTY}
CANDIDATE KEY {S#,P#}

CANDIDATE KEY {SNAME,P#} 

 

satisfying the additional functional dependencies (FDs)

 

{S#} ® {SNAME}
{SNAME} ® {S#} 

 

In my book, I claim, correctly, that relvar SSP is not in BCNF. MJ observes, however, that it's not even in 2NF (and hence a fortiori not in 3NF or BCNF, either), because if we choose--arbitrarily--{S#,P#} as the primary key, then {SNAME}, though necessarily functionally dependent on that primary key, isn't irreducibly so.

 

I fear this misconception on MJ's part is partly my fault. In the section of my book dealing with 2NF (also with 1NF and 3NF), I deliberately indulged in some "creative lying"; that is, I deliberately simplified some of the material for pedagogic reasons. In my own defense, however, let me note that I did call this fact out quite explicitly in the book! The very first paragraph of the section in question reads as follows:

 

Caveat: Throughout this section, we assume for simplicity that each relvar has exactly one candidate key, which we further assume is the primary key. These assumptions are reflected in our definitions, which (we repeat) are not very rigorous. The case of a relvar having more than one candidate key is discussed in [a later section].

 

Here then is the "not very rigorous" definition of 2NF from that same section:

 

Second normal form (definition assuming only one candidate key, which we assume is the primary key): A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key.

 

Note the explicit simplifying qualifications in this definition!

 

Here by contrast is the original (precise) definition of 2NF, essentially as given by Ted Codd back in 1971:

 

A relvar R is in second normal form if it's in first normal form and every nonprime attribute is fully dependent on each candidate key of R.

 

In order to understand this definition, of course, you need to know what a "nonprime attribute" is. Well, a prime attribute is an attribute that participates in at least one candidate key -- not necessarily the primary key, despite the "prime" terminology!--of the relvar in question. And a nonprime attribute is, of course, one that isn't prime.

 

Going back to the SSP example, then, the problem with that relvar is that attributes S# and SNAME are both prime attributes. Relvar SSP is thus in 2NF (and also 3NF, in fact) by Codd's original definition, even though it looks like it's not in 2NF (or 3NF) according to my simplified definition.

 

MJ's second question has to do with an example on page 370 of the same book. We're given a relvar SJT as follows-- 

 

SJT {S,J,T}
CANDIDATE KEY {S,J}

 

satisfying the additional FD 

 

{T} ® {J}

 

In my book, I state that this relvar can -- not necessarily should, please observe!--be decomposed into the BCNF relvars

 

ST {S,T}
CANDIDATE KEY {S,T}

TJ {T,J} 
CANDIDATE KEY {T}

 

MJ writes: "I can't work out what the general rule is that when followed would produce these [relvars] (based on the candidate keys and [FDs] that [we've been given])." Actually the rule is straightforward; it's given in my book on page 384. I'll just state it here without further explanation, except to say that (a) X and Y must be understood to be sets of attributes and (b) "dependency" must be understood to mean an FD specifically:

 

[The] following algorithm (Steps 0 to 3) is guaranteed to produce a decomposition D of [any given relvar] R into BCNF relvars that is nonloss [though] not necessarily dependency-preserving:

 

1.      Initialize D to contain just R

2.      For each nonBCNF relvar T in D, execute Steps 2 and 3. 

3.      Let X ® Y be an FD for T that violates the requirements for BCNF. 

4.      Replace T in D by two of its projections, viz. that over X and Y and that over all attributes except those in Y.

 

Finally, MJ writes: "Do you have any further notes or pointers ... that would help me?" Well, yes, I do have a couple. First, if you'd like a more thorough, formal, and rigorous treatment of this whole area of functional (and other) dependencies, then I suggest you try Chapter 7 in Jeffrey D. Ullman, PRINCIPLES OF DATABASE AND KNOWLEDGE-BASED SYSTEMS, VOL. 1, Computer Science Press (1988).

 

Second, if you'd like a more reader-friendly discussion of all of the normal forms up to and including BCNF, as well as of everything else Ted Codd invented in his wonderful series of papers from 1969 to 1979, then let me immodestly call your attention to a brief recent book of my own (152 pages) entitled THE DATABASE RELATIONAL MODEL: A RETROSPECTIVE REVIEW AND ANALYSIS. Here's the puff piece I wrote for the back cover of that book at the time it first appeared:

 

E. F. Codd's relational model, first presented to the world in a startlingly novel series of research papers over the period 1969-1979, was a revolution at the time, albeit one that was desperately needed. Some thirty or so years later, however, it seems that the database community in general has come to regard the relational model as somehow passé and no longer relevant -- even though the entire database industry is in fact founded on that model. Certainly it is easy to find databases and database applications (not to mention DBMS products) in which relational precepts have been overlooked or violated, usually with very negative consequences. Two factors that might help to explain this state of affairs are as follows: (a) Many of Codd's original papers have since become hard to find; (b) with all due respect to their author, some of them are not easy to read! The purpose of the present book is thus:

·         To revisit Codd's original papers; 

·         To explain their contribution and assess their significance;

·         And, most important, to restate and reinforce their message for a new generation of database professionals.

 

Posted 05/09/02

 

 

 

[ABOUT] [QUOTES] [LINKS]