Sunday, April 28, 2013

Tables, Full Normalization and Business Rules

REVISED: 10/16/16
 

Often somebody produces a table and asks if it is fully normalized (in 5NF) and, if not, in what normal form it is. This is an indication of poor grasp of data fundamentals.

Consider ASSIGNMENTS:
 EMP# ENAME    PROJECT       DEPT#
===================================
 100  Spenser  Sys Support   E21
 100  Spenser  Comp Svcs     E21
 100  Spenser  Supp Svcs     E21
 160  Pianka   Info Center   D11
 310  Setright Documentation D11
 310  Setright Mfg Systems   D11
 150  Adamson  Info Center   D11
-----------------------------------
First, a normal form is a property of a relation, not a table (a R-table is only a "visual shorthand" for a relation -- a special kind of table that visualizes a relation on some physical medium (e.g., paper) -- and the two should not be confused.

Second, the normal form of a relation is determined from attribute dependencies. Formally, a relation is fully normalized (in 5NF) if and only if the only dependencies that hold in it are functional dependencies (FD) of the non-key attributes on the key (i.e., there is exactly one value of each non-key attribute for every key value, but not vice-versa). Since a key represents an entity identifier, this condition exists only when, informally, a relation represents entities of a single type (why?) Is this true for the relation pictured by ASSIGNMENTS?

The fact is that whether a relation represents a single type of entity -- and, therefore, is fully normalized -- cannot be ascertained from sheer visual inspection of the table picturing it. It requires knowledge of what the underlying relation means, namely the type(s) of entity specified by the business rules in the corresponding conceptual model that the relation represents.

For example, if the rules:

  • R1: Every employee is identified by an employee number.
  • R2: Every employee has an employee name.
  • R3: Every employee works in a department.
  • R4: Every project assignment is identified by an employee number and a project name.
model two types of entity:
  • Employees: {emp. number} --> {employee name, department number}
  • Project assignments: {employee number, project}
then the relation represents both and, consequently, is not in 5NF. In fact, without a well defined and complete conceptual specification of entity types, you can't even tell whether relations have keys (if they do not, they are not relation) and, if they do, what the key is.

Database design adhering to the Principle of Full Normalization (POFN) do not "bundle" entity types and produce 5NF relations, obviating the need for further normalization. For the advantages of full normalization and the drawbacks of "denormalization for performance" illusion, see paper #2 and the recently published DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

Explicit further normalization is necessary only to "repair" poorly designed non-5NF relations by replacing them 5NF projections. For example, ASSIGNMENTS with EMPLOYEES and PROJ_ASSIGNS pictured by R-tables:

EMP# ENAME    DEPT#
====================
 100  Spenser  E21
 160  Pianka   D11
 310  Setright D11
 150  Adamson  D11
--------------------

 EMP# PROJECT
=================  
 100  Sys Support
 100  Comp Svcs
 100  Supp Svcs
 160  Info Center
 310  Mfg Systems
 310  Comp Svcs
 150  Info Center
------------------
each representing a single entity type. The repair is possible because the following holds:
ASSIGNMENTS{EMP#,ENAME, DEPT#} JOIN ASSIGNMENTS{EMP#,PROJECT} = PROJ_ASSIGNMENTS
where the left-hand side is a join of two projections of ASSIGNMENTS (i.e., no information is lost).




Thursday, April 25, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

BTW: If you live in San Francisco, attend the conference on 5/22 and can give me rides to and/or from Pleasanton, or know somebody who can, it will be greatly appreciated. Please email me at the address on the About page.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

Incidentally, since with the discovery by business of analytics as some sort of "new data science", overnight born-again BI experts proliferate like frogs after heavy rain. It suggest a similar poverty of foundation knowledge and rich debunking targets.
Please submit any pearls you come across that could be targets interesting from a data perspective.

3.
The 'Quote of the Week' was posted on the QUOTES page.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Many years ago I wrote something about what I called the "kitchen sink" approach to data management, but this one takes the cake. All the following are included:
  • Key-value pair programming language
  • Entity Attribute Value database model
  • Relational Database Management System, specifically Postgres 9
  • Objects and object metadata
  • SQL client interface (returns objects of various types)
  • Procedural SQL [FP: Huh?]
  • Schema of "Sprout data model" [FP: Wonder what that is]
  • Objects (tables, views) are accessed with their resource identifier
  • High level syntax-independent [FP: Wow!!!!]
and much more (check out, in particular, the bulleted list of features).

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

6.
Consider the topics in Jonathan Lewis' Oracle Mechanisms Webinar in the context of my argument that, given so many physical/implementation factors that affect performance, why the instinct to attribute poor performance to (logical) denormalization?  And there are many more than those tackled by Jonathan.

7.
While checking hits to this site, I noticed that one of them was due to the  following Google search: "My data model is a better model of reality than your data model. What would your response be?"

Well?



Sunday, April 21, 2013

Un-muddling Modeling, Part II

This is the second part of my response to comments in the online exchange Data (and other) models and aesthetics.
GE:So, from your perspective, a conceptual model has no rigorous definition? It is like a sketch of a picture yet to be completed? Or like an outline to a paper to be written or fleshed out? And once the model is rigorously defined, the ad hoc, informal model must be precisely consistent with the underlying model in all its semantics. Are you suggesting that a conceptual model is a precursor to a defined logical (relational) model? Then after the relational model is defined, the conceptual model needs to be a consistent abstraction of the formal logical model.

Thursday, April 18, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

3.
The Quote of the Week was posted on the QUOTES page.

There was a comment to my recent Un-muddling Modeling, Part 1 that the conceptual and logical models do not require the relationship concept. However, this does not mean we cannot refer to relationships that are implicit in the models and that is usually in response to arguments like this one.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Nokia Entertainment: Why we went Mongo

An excellent example of how products are selected in the absence of foundation knowledge.

Related.

Ideas to integration data sets from structured and unstructured data

Bay Area coding boot camps promise to launch tech careers   
SAN FRANCISCO -- Looking for a career change, Ken Shimizu decided he wanted to be a software developer, but he didn't want to go back to college to study computer science.

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

How to I create a logical data model for Geospatial Data?

6.
Big Data Is Just For Big Companies - And Other BS

There are two related core cycles in IT: centralization/decentralization/re-centralization and corporatization/democratization/re-corporatization.

7.
I have often referred to the difficulty of conveying informally the formal without losing either the rigor, or the audience. David Portas, one of the few knowledgeable practitioners, demonstrates some of that difficulty in his comments to the following post by Hugo Kornelis: NULL - The database's black hole

8.
Enjoy.

Big Data Dilbert


Sunday, April 14, 2013

Un-muddling Modeling, Part I

I've been asked to respond to some comments on an online exchange, Data (and other) models and aesthetics and I can do it more justice here. In what follows I am careful to keep
  • informal conceptual language e.g. property, entity, attribute;
  • formal abstract mathematical language e.g. domain, relation;
  • formal language applied to databases e.g. domain, R-table, row, column;
distinct.

Wednesday, April 10, 2013

Site Update

1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE page.

2.
The 'Quote of the Week' was posted on the QUOTES page.

3.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

4.
Links to online exchanges I participated in were posted on the FP ONLINE page.

5.
I agree with most of Cary Millsap's take on NoSQL and Oracle, Sex and Marriage, but note that there is no reference whatsoever to the implications of the data models involved.

6.
After First Great Blunder Refuted consider my Type vs. Domain and Class.

7.
Another job description: Analytics- Data Modeler. Any idea why I post such?



Sunday, April 7, 2013

More on Relational Denial

Note: What follows are my comments on a LinkedIn exchange, So What is a 'Large Database'? Minor edits of the online comments for grammatical, clarity, precision and coherence purposes are within square brackets.
PS: No doubt Oracle/SQL Server/etc are designed and optimized to deal with normalized data. That's where the power lies. They're like Sirens though ... those who don't respect them with proper designs are destined to have performance crashes (bear with me on this metaphor will ya? :)