Sunday, April 28, 2013

Tables, Full Normalization and Business Rules

REVISED: 10/16/16
 

Often somebody produces a table and asks if it is fully normalized (in 5NF) and, if not, in what normal form it is. This is an indication of poor grasp of data fundamentals.

Consider ASSIGNMENTS:
 EMP# ENAME    PROJECT       DEPT#
===================================
 100  Spenser  Sys Support   E21
 100  Spenser  Comp Svcs     E21
 100  Spenser  Supp Svcs     E21
 160  Pianka   Info Center   D11
 310  Setright Documentation D11
 310  Setright Mfg Systems   D11
 150  Adamson  Info Center   D11
-----------------------------------
First, a normal form is a property of a relation, not a table (a R-table is only a "visual shorthand" for a relation -- a special kind of table that visualizes a relation on some physical medium (e.g., paper) -- and the two should not be confused.

Second, the normal form of a relation is determined from attribute dependencies. Formally, a relation is fully normalized (in 5NF) if and only if the only dependencies that hold in it are functional dependencies (FD) of the non-key attributes on the key (i.e., there is exactly one value of each non-key attribute for every key value, but not vice-versa). Since a key represents an entity identifier, this condition exists only when, informally, a relation represents entities of a single type (why?) Is this true for the relation pictured by ASSIGNMENTS?

The fact is that whether a relation represents a single type of entity -- and, therefore, is fully normalized -- cannot be ascertained from sheer visual inspection of the table picturing it. It requires knowledge of what the underlying relation means, namely the type(s) of entity specified by the business rules in the corresponding conceptual model that the relation represents.

For example, if the rules:

  • R1: Every employee is identified by an employee number.
  • R2: Every employee has an employee name.
  • R3: Every employee works in a department.
  • R4: Every project assignment is identified by an employee number and a project name.
model two types of entity:
  • Employees: {emp. number} --> {employee name, department number}
  • Project assignments: {employee number, project}
then the relation represents both and, consequently, is not in 5NF. In fact, without a well defined and complete conceptual specification of entity types, you can't even tell whether relations have keys (if they do not, they are not relation) and, if they do, what the key is.

Database design adhering to the Principle of Full Normalization (POFN) do not "bundle" entity types and produce 5NF relations, obviating the need for further normalization. For the advantages of full normalization and the drawbacks of "denormalization for performance" illusion, see paper #2 and the recently published DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

Explicit further normalization is necessary only to "repair" poorly designed non-5NF relations by replacing them 5NF projections. For example, ASSIGNMENTS with EMPLOYEES and PROJ_ASSIGNS pictured by R-tables:

EMP# ENAME    DEPT#
====================
 100  Spenser  E21
 160  Pianka   D11
 310  Setright D11
 150  Adamson  D11
--------------------

 EMP# PROJECT
=================  
 100  Sys Support
 100  Comp Svcs
 100  Supp Svcs
 160  Info Center
 310  Mfg Systems
 310  Comp Svcs
 150  Info Center
------------------
each representing a single entity type. The repair is possible because the following holds:
ASSIGNMENTS{EMP#,ENAME, DEPT#} JOIN ASSIGNMENTS{EMP#,PROJECT} = PROJ_ASSIGNMENTS
where the left-hand side is a join of two projections of ASSIGNMENTS (i.e., no information is lost).




Thursday, April 25, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

BTW: If you live in San Francisco, attend the conference on 5/22 and can give me rides to and/or from Pleasanton, or know somebody who can, it will be greatly appreciated. Please email me at the address on the About page.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

Incidentally, since with the discovery by business of analytics as some sort of "new data science", overnight born-again BI experts proliferate like frogs after heavy rain. It suggest a similar poverty of foundation knowledge and rich debunking targets.
Please submit any pearls you come across that could be targets interesting from a data perspective.

3.
The 'Quote of the Week' was posted on the QUOTES page.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Many years ago I wrote something about what I called the "kitchen sink" approach to data management, but this one takes the cake. All the following are included:
  • Key-value pair programming language
  • Entity Attribute Value database model
  • Relational Database Management System, specifically Postgres 9
  • Objects and object metadata
  • SQL client interface (returns objects of various types)
  • Procedural SQL [FP: Huh?]
  • Schema of "Sprout data model" [FP: Wonder what that is]
  • Objects (tables, views) are accessed with their resource identifier
  • High level syntax-independent [FP: Wow!!!!]
and much more (check out, in particular, the bulleted list of features).

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

6.
Consider the topics in Jonathan Lewis' Oracle Mechanisms Webinar in the context of my argument that, given so many physical/implementation factors that affect performance, why the instinct to attribute poor performance to (logical) denormalization?  And there are many more than those tackled by Jonathan.

7.
While checking hits to this site, I noticed that one of them was due to the  following Google search: "My data model is a better model of reality than your data model. What would your response be?"

Well?



Sunday, April 21, 2013

Re-write


See 
Un-muddling Modeling: Relations & Relationships Part II.








Thursday, April 18, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

3.
The Quote of the Week was posted on the QUOTES page.

There was a comment to my recent Un-muddling Modeling, Part 1 that the conceptual and logical models do not require the relationship concept. However, this does not mean we cannot refer to relationships that are implicit in the models and that is usually in response to arguments like this one.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Nokia Entertainment: Why we went Mongo

An excellent example of how products are selected in the absence of foundation knowledge.

Related.

Ideas to integration data sets from structured and unstructured data

Bay Area coding boot camps promise to launch tech careers   
SAN FRANCISCO -- Looking for a career change, Ken Shimizu decided he wanted to be a software developer, but he didn't want to go back to college to study computer science.

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

How to I create a logical data model for Geospatial Data?

6.
Big Data Is Just For Big Companies - And Other BS

There are two related core cycles in IT: centralization/decentralization/re-centralization and corporatization/democratization/re-corporatization.

7.
I have often referred to the difficulty of conveying informally the formal without losing either the rigor, or the audience. David Portas, one of the few knowledgeable practitioners, demonstrates some of that difficulty in his comments to the following post by Hugo Kornelis: NULL - The database's black hole

8.
Enjoy.

Big Data Dilbert


Wednesday, April 10, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE page.

2.
The 'Quote of the Week' was posted on the QUOTES page.

3.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

4.
Links to online exchanges I participated in were posted on the FP ONLINE page.

5.
I agree with most of Cary Millsap's take on NoSQL and Oracle, Sex and Marriage, but note that there is no reference whatsoever to the implications of the data models involved.

6.
After First Great Blunder Refuted consider my Type vs. Domain and Class.

7.
Another job description: Analytics- Data Modeler. Any idea why I post such?



Sunday, April 7, 2013

More on Relational Denial

Note: What follows are my comments on a LinkedIn exchange, So What is a 'Large Database'? Minor edits of the online comments for grammatical, clarity, precision and coherence purposes are within square brackets.
PS: No doubt Oracle/SQL Server/etc are designed and optimized to deal with normalized data. That's where the power lies. They're like Sirens though ... those who don't respect them with proper designs are destined to have performance crashes (bear with me on this metaphor will ya? :)