MORE ON NORMALIZATION
with Fabian Pascal

 

 

 

From: MA

To: Editor

Date: 30 Mar 2004

 

Would you apply the rigor of normalization to reporting databases? As it is, the world of BI is evolving to a point where it may someday be possible to query very large, fully normalized databases with performance impunity.

 

Your thoughts?

 

 

From: Fabian Pascal

To: MA

 

What do you mean by "reporting databases"?

 

Normalization is not database-specific, it is based on universal principles that apply to any and all databases; and it has nothing to do with performance. Fail to adhere to them and you get into all sorts of problems. For a detailed explanation see my DATABASE DEBUNKINGS paper #6 The Costly Illusion: Normalization, Integrity and Performance; or organize/attend my seminar by the same name.

 

There exists an implementation technology--the TransRelational(TM) Model--that would achieve what you describe, but unfortunately, business and legal problems prevent it from being deployed. We hope at some point it will be. For a short overview of it attend Date's seminar Advanced Concepts in Database Management.

 

 

From: MA

 

What I mean by "reporting databases" is anything used in data warehousing and BI or any other type of reporting where performance may be affected by join overhead.

 

 

From: Fabian Pascal

 

Performance is not “affected by join overhead", but determined by the physical implementation of the DBMS and database, hardware, network, etc. If join performance is poor, it’s implementation factors that should be questioned, not logical operations. It is only how joins are executed that determines performance.

As I explain in my forthcoming paper:

1. The notion that there are "read-only" databases is somewhat of an illusion. They need to be populated, don't they? And if integrity is compromised then, it is compromised for all ensuing querying, right?

2. Denormalization does not improve performance across the board for all applications, even querying-only ones, and there are integrity/complexity issues even with read-only databases.

 

 

From: MA

 

What I have found in my research is that it is the databases themselves as implemented by the vendors that are the issue, not the design.   I'm not sure if this is what you meant in your recent response.

 

If this is so, then de-normalization in data warehousing environments may still be necessary, unless perhaps you are using TeraData which seems to be a more intelligent architecture.

 

 

From: Fabian Pascal

 

You mean DBMSs, not databases. Yup.

The question is are you aware of the price you pay for denormalization? And that you are biasing some applications against others? If you do and are willing to consciously accept it, fine.  Problem is, most practitioners are oblivious to this. Those who invented warehouses certainly were. They did not understand data fundamentals, part. the difference between databases and application views thereof.

People should stop talking about denormalization for performance and focus on product implementation improvements. Until and unless they do, they will continue to get poor implementations.

I strongly recommend reading my forthcoming paper.

 

 

From: MA

 

I understand your point about DBMS (not database :) implementations. 

 

As I said, I've come across this in my research and readings.  I will certainly add your paper to my reading list.   I do not want to become complacent in my thinking and appreciate new angles that challenge my conceptions.  This is why reading articles and books by people, such as yourself, is so important.

 

I must say that there is a mountain of legacy thinking to overcome in your mission.   These habits die hard. 

 

As it is, BI tools are moving toward these goals in that the labor and time that has been used with reconciliation models (staging models) and movement of data to a de-normalized relational structures or star schemas is becoming unnecessary and outdated.   Soon real-time requirements and advanced tools, will move this, perhaps to the operational database or a fully normalized integrated reporting database and perhaps to the utilization of XML queries instead of SQL.

 

It's exciting.  Keep up the good work.

 

 

From: Fabian Pascal

 

Good for you.

Matters are becoming worse, not better. It's no longer habits, but ignorance. There is increasingly more of it. The problem is that the system does not reward, and actually punishes critical independent thinking and foundation knowledge. So they produce more ignorance over time.

 

Permit me to be skeptical. XML is an excellent example of ignorance and regression by decades. You should read my articles on the subject (they're free). There is also my seminar called: XML-The Exchange Tail and the Management Dog.

 

 

From: MA

 

In my heart I think you are right about XML.  I will add this article to my list.  The reason I have given this thought is that it is very popular with developers who have embraced this in a very large way and espouse it as the Holy Grail.  I always like to give ideas consideration and keep an open mind.   In addition, I like to stay employed and want to find a way to provide some value in the work I do.  This involves accepting certain realities both technical and political.

 

Which brings me to another point.   One of the biggest liabilities to optimal data architecture (I'm a data architect) is application development, which doesn't even think data architecture is necessary or many times even knows what it is!

 

The battle here extends to the DBAs that work in these environments who have developed very bad habits in terms of both DBMS implementation and database design.  This includes so much redundancy and confusion, that it cannot be possible for them to be providing their client base with repeatable, predictable and verifiable results. This problem is very deep-seated, very prevalent and very disheartening at times.    Companies succeed in spite of what they do and many times have very little rigor, structure, and measurability in their approach to building systems and poor methodology.

 

I won't go into too much detail, but the client with whom I am working is mired in legacy thinking (not to mention legacy systems).  They have no concept of data architecture, with most applications still utilizing VSAM files and COBOL programs.   Databases they do use have tables with no referential integrity at all ... hundreds of tables with no relationship to each other.   The relationships are implemented via Metadata in join matrices that create problems for clients who join too many levels deep.   They counteract this via "combined" tables" a.k.a. de-normalized tables that are made up of already de-normalized tables.

 

Their operational applications are being modernized to eliminate the VSAM files using the RUP methodology that doesn't think Logical Models are necessary.

 

As for XML Schema, I think they CAN be designed correctly as any schema can, but since this is in the hands of developers and others who do not understand design, they often are not.

 

I'll stop now ... I'm ranting a bit....

 

I have a lot of reading and thinking to do to facilitate refinement in my evolution as a data architect.   You've definitely provoked a new tangent in my thinking about my work.   

 

Thanks for that!

 

 

From: Fabian Pascal

 

You need to distinguish between merit and industry practices. It’s one thing to conform for employment reasons, and quite another to defend industry fads on merit. Keeping an open mind is not enough without knowledge of fundamentals, which is what the term usually means in the industry. It’s those without knowledge that push these fads that often are just rehashed past failures they're ignorant of. If you read my articles about XML you will see what I mean.

I've written about this problem extensively. It's becoming worse and worse due to the increasing ignorance I am referring to. Everything is done by programmers these days who know nothing but some programming language and want to solve everything with it.

You are preaching to the choir. I've been deploring this for years. But then, what do I know, I am not a programmer.

 

My writings are chockfull of such examples. Lost cause.

So what else is new? As to XML, how can a data exchange delimited format (not even syntax) be used for data management (semantics)?


That's OK, been there, done that. I am probably older than you at this and don't rant anymore.

Excellent. That's the whole point.

 

 

Posted 05/21/04