ON NORMALIZATION AND THE SCIENTIFIC METHOD
with Fabian Pascal

 

 

 

From: PA

To: Editor

 

I find [your article in DM Review] to contradict your stated devotion to scientific methods and the value of theory. You present a single example of denormalization, then proceed to draw a conclusion about denormalization in general.  In addition, the example chosen is not typical of real world denormalizations.

 

In order to be half-way consistent with your own ideals, you would need to present at a minimum an exhaustive list of the types of denormalizations used in practice, along with an objective list of the pros and cons of each.  I would expect that if this were undertaken, you would end up with a more balanced view, and some exceptions to your black-and-white conclusions. 

 

Of course, to prove your point scientifically would require far more effort than this, if indeed it were at all possible to prove or disprove your statements.  This brings me to my key point: if your contention is not falsifiable, it does not belong in the realm of true science at all, instead it belongs in the domain of mere opinion and belief. 

 

Please tell us how you have proved your propositions, or else refrain from claiming that you are working from a sound scientific foundation and everyone else is somehow misguided.

 

Relational algebra has nothing to say about real-world performance.

 

 

From: Fabian Pascal

To: PA

 

You are confusing formal theory with empirical theory. In the case of normalization, the theory is formal, not empirical. To realize your error, please provide any one example of denormalization for which my arguments in the article does not apply logically.

 

 

From: PA

 

No problem.

 

I want users to be able to quickly retrieve total monthly sales for product A. They do this hundreds of times a month.  I create a table keyed on Year, Month and Product, to hold the total sales.  I then update the total as orders are processed.  In a completely normalized database, the query to get the total would have to read thousands of rows of order lines, and would be orders of magnitude slower.

 

Also, I would appreciate it if you could explain how my arguments do not apply to formal theory.

 

 

From: Fabian Pascal

 

First of all, as I guessed, like so many practitioners, you do not understand what normalization is. Your example is one of storing derived data--a form of redundancy different than redundancy due to denormalization. If you read the chapter on redundancy in my book, you will see that I have separate sections for denormalization and for derived data. What is more, your example seems to refer to historic data, which are not updated, and hence redundancy is not an issue.

 

My arguments apply logically to any redundancy of data that is being updated, including your example. The only reason you may get better performance is because you trade integrity off for it and ignore the risk of inconsistency. Now, if practitioners knew and understood this, and consciously decided to give that up for performance, I would still worry, but if that's their choice, fine. The problem is that the vast majority is completely unaware of the integrity risk and ignores it when they denormalize, thinking that they get performance for free.

 

Regarding your statement that "Relational algebra has nothing to say about real-world performance", my article says exactly that: normalization/denormalization are logicaland cannot possibly affect performance, which is physical by definition. What this means is that if you get bad performance, it is not due to your logical design, but to the physical implementation of your database and DBMS, as well as other implementation factors. Your problem is that, like so many, you confuse logical and physical levels and this is so entrenched in your mind that even an article which makes every effort to disabuse you of such confusion, cannot get thru.

 

The distinction between empirical and formal theory is much beyond databases and computers--it requires an understanding of science, and the difference between the two kinds of theory is not something that can be explained and learned via email. If this is of interest to you, I suggest you educate yourself on the subject, particularly if you want to engage in public discussion on it.

 

 

Posted 08/23/02

 

 

 

[ABOUT] [QUOTES] [LINKS]