Sunday, April 28, 2013

Understanding Further Normalization: 2NF

I strive and manage to avoid errors most of the time, but given the amount of writing I do, not 100% of the time. The chance for slipups increases when many examples are involved and I focus on the same material for an extended period of time. Luckily, there is usually some reader who stays awake while reading it and notifies me. This is what happened with the 2NF example in my normalization paper. This is the bad news.

The good news is that this is an opportunity to provide those who purchased the paper with a correction, while at the same time give those who have not, what I consider the right way to think of further normalization advanced in the paper. (The paper has been revised accordingly; revised copies will be delivered upon request).

GP: [First, t]he example you give seems to me to be about 3NF, not 2NF. I think this is the case because  DEPT# is not actually part of a candidate key for EMP_ASSIGNMENTS. The only key to EMP_ASSIGNMENTS is EMP# (because CR6 states that an employee can only belong to one department). If, on the other hand, an employee could belong to many departments then I think  your example would indeed be illustrative of 2nd NF. Please could you let me know if I’ve misunderstood something here.
Alas, no, GP is absolutely right and I apologize for the error. Here is the correct example.

Consider the following table: Is it in 5NF and, if not, what normal form is it in?
EMP      PROJECT       DEPT#
Spenser  Sys Support   E21
Spenser  Comp Svcs     E21
Spenser  Supp Svcs     E21
Pianka   Info Center   D11
Setright Documentation D11
Setright Mfg Systems   D11
Adamson  Info Center   D11
The formal 5NF requirement is that all join dependencies that hold in the R-table are on the key. What this means, informally, is that a 5NF R-table represents a single class (and type) of entities. In a single class, every descriptive attribute is functionally dependent on the identifier and in the 5NF R-table representing it every non-key column is functionally dependent on the key (see the paper for more details).

If we are given a table and asked in what normal form it is, we cannot establish that without consulting the business model to ascertain the class(es) involved and whether the R-table represents only one, in which case it satisfies the 5NF criterion (see Normal Forms: Dependent on Dependencies).

Suppose the business model consists of two entity classes, employees and project assignments, specified by the following business rules: 
CR1: Every employee has an employee number.
CR2: Every employee has a name.
CR2: Every employee is assigned to one department.
DR1: No two distinct employees have the same employee number.

CR4: Every employee is assigned to one or more projects.
DR2: No two distinct assignments are of the same employee to the same project.
The business model specifies two entity classes that, apparently, have been "bundled" into one table, so it is not in 5NF. But as an R-table it is in 1NF (see the paper for the definition).

Given the entity identifiers identified by the business model, the attribute dependencies in the employees class is
{emp. number} --> {emp. name, dept. number}
and the project assignments class has a two-attribute identifier and no descriptive attributes.
{emp. number, project}
So the column dependency that holds in it is
{EMP#} --> {DEPT#}
but the key is {EMP#, PROJECT}. In other words, because of the bundling, the non-key column is dependent not on the key, but on a component of a (composite) key. This is a violation of 2NF, which results in the following undesirable properties (see the paper for details):
  • Redundancy (as highlighted in color)
  • Update anomalies
  • Complexity/proneness to errors
These drawbacks could have been avoided by mapping the two entity classes each to its own 5NF R-table:
100  Spenser  E21
160  Pianka   D11
310  Setright D11
150  Adamson  D11

100  Sys Support 
100  Comp Svcs
100  Supp Svcs
160  Info Center
310  Mfg Systems
310  Comp Svcs
150  Info Center
These are exactly the two projections, EMPLOYEES and PROJ_ASSIGNMENTS, into which EMP_ASSIGNMENTS can be separated by further normalization to 2NF--a repair of the design--to eliminate the drawbacks. The paper explains why such repair is possible, the following join dependency holds:
Here we come to the second reader question:
At the end of the section on 2NF you write “Note that R-tables with simple keys are in 2NF” (and you repeat this assertion in the conclusion). However, Chris Date points out that this is not the case in his 2012 book DATABASE DESIGN AND RELATIONAL THEORY, where he poses the question: “Is it true that if a relvar isn’t in 2NF, then it must have a composite key?” and answers it thus:
Let USA be a binary relvar with attributes COUNTRY and STATE; the predicate is STATE is part of COUNTRY, but COUNTRY is the USA in every tuple. Now, {STATE} is the sole key for this relvar, and the functionl dependency {STATE} -> {COUNTRY} thus certainly holds. However, the FD {} -> {COUNTRY} clearly holds as well…;the FD {STATE}-> {COUNTRY} is thus reducible, and so the relvar isn’t in 2NF, and yet the key {STATE} isn’t composite.
Date’s example stuck in my mind when I read the book, because, like yourself, I had intuitively thought that for a table not to be in 2NF it must have a composite key.
Strictly speaking Date is correct. But would you design such a table and, if not, why not? (Would the alternative design be the same if only US data were represented, or other country data were possible?)

Be that as it may, "An R-table with simple keys is in 2NF in most cases" would have been more accurate in the context of design repairs: you may encounter such designs in practice and must be aware that they are in 2NF, but they are poor designs for other reasons.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment