ON NORMALIZATION VS. NONTRIVIAL TRANSITIVE FUNCTIONAL DEPENDENCIES
with C. J. Date

 

 

 

From: Lajos Nagy

Date: 20 Jul 2005

 

Have you ever heard of Dr. Halassy?  He's a Hungarian and used to argue a lot with E.F. Codd about databases back in the 70s.  I read his books on databases, which are considered the best in the field in Hungary, and I came upon an interesting idea of his concerning relational normal forms. In a nutshell, he argues that the only thing we should look out for in our relations are non-trivial transitive functional dependencies.  These transitive dependencies are exactly the ones that cause all the well-known problems of not-normalized relations.  Basically he generalizes the notions of 2NF, 3NF, BCNF, and replaces them with a single check for transitive dependencies.  In this approach 2NF, 3NF, BCNF are not successive (or higher) normal forms that define a hierarchy but rather different names for the same problem: transitive dependencies.

 

More formally:

 

Let R be a relation schema and A, B, C be subsets of attributes, not necessarily disjoint, of R.  Furthermore, let A be a candidate key of R. There's a transitive functional dependency (TFD) on R if the FDs: A -> B, B -> C, and A -> C all hold on R.

 

Each FD in the TFD can be either trivial or non-trivial FD.  (Just for completeness: the FD A -> B is trivial if B is a subset of A, otherwise it's non-trivial.)  This gives rise to eight possible situations that we have to consider. (There are three FDs and each can be either trivial or non-trivial so that's 2^3 = 8.)  Let's "T" mark trivial and "N" non-trivial.  This way a situation can be described with a three letter sequence: "TNN" for example describes the situation where A -> B is trivial, B -> C is non-trivial, and A -> C is non-trivial.

 

Let's analyze each case in turn:

 

TTT  -  only trivial FDs, no problem.

TTN  -  impossible

TNT  -  not in BCNF

TNN  -  not in 2NF

NTT  -  equivalent to a single FD, A -> (B - C), so no real transitivity

NTN  -  equivalent to a single FD, A -> B, so no real transitivity

NNT  -  not in 3NF

NNN  -  not in 3NF

 

Basically, if we can find a TFD of type TNT, TNN, NNT, or NNN in our schema then it is not normalized.  Using this generalized notion we can succinctly define the "traditional" normal forms:

 

2NF  = no TNN

3NF  = 2NF and no NNT and no NNN

BCNF = 3NF and no TNT

 

One advantage of Dr. Halassy's approach is that it does away with the clumsy 'key', 'non-key' distinction between attributes in the definitions of the normal forms, presents them in a systematic and terse way, and emphasizes the fact that the very existence of non-trivial transitive functional dependencies lies at the heart of the problems with non-normalized relations.

 

So, what do you think?

 

 

C. J. Date Responds: I would like to offer a few observations on the proposed alternative perspective on normalization. 

 

    First an obvious point:  Whatever the advantages and disadvantages of that alternative perspective might be, it appears to be of no help in connection with normal forms higher than BCNF (4NF, 5NF, 6NF), nor with JDs or MVDs that are not FDs. 

 

    Second, I agree that the classical definitions of 2NF and 3NF involve a "clumsy" distinction between key and nonkey attributes, but I don't agree that the same criticism applies to the classical definition of BCNF.  BCNF is the normal form with respect to functional dependencies, and the classical definition is as follows: 

 

Relvar* R is in BCNF if and only if, for every nontrivial FD A -> B satisfied by R, A is a superkey for R

 

(See my recent book DATABASE IN DEPTH: RELATIONAL THEORY FOR PRACTITIONERS for further discussion.)  This definition is simple, straightforward, and elegant, and it makes no mention of key vs. nonkey attributes; in fact, it also makes no mention of those other "clumsy" concepts, reducible FDs and transitive FDs, either.  Thus, while I do support the goal of making life easier for the database designer by reducing the number of concepts he or she has to deal with—which Halassy's idea of "nontrivial transitive FDs" seems to be all about—I think the classical definition of BCNF does the job pretty well, too. 

 

    Third, I have some specific issues with the proposed formalism.  Here's Halassy's definition of transitive FD, quoted verbatim: 

 

Let R be a relation schema and A, B, C be subsets of attributes, not necessarily disjoint, of R.  Furthermore, let A be a candidate key of R.  There's a transitive FD (TFD) on R if the FDs: A -> B, B -> C, and A -> C all hold on R

 

First issue:  If A is a key, the FDs A -> B and A -> C hold necessarily!  Thus, the definition reduces to saying there's a TFD if B -> C holds.  But B and C are arbitrary subsets of the attributes of R, so the definition apparently says that if R satisfies any FD at all, then it satisfies a TFD—which in turn implies that every "relation schema" R satisfies a TFD, since R certainly satisfies the FD H -> H, where H is the set of all of the attributes of R.  (Actually I suppose that's strictly correct.  If A is any subset of the set of attributes of R, we certainly have A -> A and A -> A, and so A -> A is a TFD.  But there must be more to the concept than that.  In other words, my first issue is:  I think the definition of TFD needs to be tightened up.) 

 

Second issue:  The analysis on the second page of Nagy's communication does not seem to be completely correct.  Consider the usual suppliers-and-parts database.  Suppose supplier numbers and supplier names are both "unique," and consider the relvar

 

S { S#, SNAME, CITY }

 

Now take R as S and A, B, C as S#, SNAME, CITY, respectively (note that A is indeed a key, as required).  Then A -> B, B -> C, and A -> C are all nontrivial; yet the relvar is in 3NF (and in fact BCNF), contrary to the claim that "all three FDs nontrivial implies the relvar is not in 3NF." 

 

Third issue:  The analysis on the second page of Nagy's communication seems to be incorrect in another respect also.  Consider the relvar SJT with attributes S (student), J (subject), and T (teacher), with predicate:  Student S is taught subject J by teacher T.  Assume that: 

 

·         For each subject, each student of that subject is taught by only one teacher. 

 

·         Each teacher teaches only one subject (but each subject is taught by several teachers). 

 

This relvar satisfies the FDs {S,J} -> {T} and {T} -> {J}.  Now take R as SJT and B, C, A as T, J, and the combination of S and J, respectively (note that A is indeed a key, as required).  Then A -> B and B -> C are nontrivial while A -> C is trivial; yet the relvar is in 3NF (though not BCNF), contrary to the claim that "A -> B and B -> C nontrivial and A -> C trivial implies the relvar is not in 3NF." 

 

Fourth issue:  It's claimed that "if we can find a TFD of type TNT, TNN, NNT, or NNN in our schema then it [i.e., the schema, not the TFD] is not normalized."  Two points: 

 

·   The earlier "definition" of TFD doesn't actually say what a TFD is!  It says a TFD exists if certain conditions obtain, but it doesn't say what it is that constitutes the TFD in question.  So, strictly speaking, we haven't been told what "a TFD of type TNT" (etc.) actually means.  Nor (with reference to the earlier claim regarding "nontrivial TFDs") do we know, strictly speaking, what it means for a TFD to be trivial or otherwise. 

 

·   Assuming the definition of TFD is tightened up appropriately and the analysis corrected in the light of (at least) the second and third issues identified above, I now observe that the term "normalized" as used in the quoted claim refers specifically to BCNF.  Since there are so many distinct normal forms (at least six of them, not counting 1NF), I think it's advisable to avoid use of the unqualified term "normalized," at least in formal contexts.

 

 

Posted 9/23/05