From: MA
To: Editor
Date: 30 Mar 2004
Would you apply the rigor of normalization to reporting
databases? As it is, the world of BI is evolving to a point where it may
someday be possible to query very large, fully normalized databases with
performance impunity.
Your thoughts?
From: Fabian Pascal
To: MA
What do you mean by "reporting databases"?
Normalization is not database-specific, it is based on
universal principles that apply to any and all databases; and it has
nothing to do with performance. Fail to adhere to them and you get into all
sorts of problems. For a detailed explanation see my DATABASE DEBUNKINGS paper #6 The Costly Illusion:
Normalization, Integrity and Performance; or organize/attend my seminar by the same
name.
There exists an implementation technology--the TransRelational(TM)
Model--that would achieve what you describe, but unfortunately, business
and legal problems prevent it from being deployed. We hope at some point it
will be. For a short overview of it attend Date's seminar Advanced Concepts in
Database Management.
From: MA
What I mean by "reporting databases" is anything
used in data warehousing and BI or any other type of reporting where
performance may be affected by join overhead.
From: Fabian Pascal
Performance is not “affected by join overhead",
but determined by the physical implementation of the DBMS and database,
hardware, network, etc. If join performance is poor, it’s implementation
factors that should be questioned, not logical operations. It is only
how joins are executed that determines performance.
As I explain in my forthcoming paper:
1. The notion that there are "read-only" databases is somewhat of an
illusion. They need to be populated, don't they? And if integrity is
compromised then, it is compromised for all ensuing querying, right?
2. Denormalization does not improve performance across the board for all
applications, even querying-only ones, and there are integrity/complexity
issues even with read-only databases.
From: MA
What I have found in my research is that it is the databases
themselves as implemented by the vendors that are the issue, not the
design. I'm not sure if this is what you meant in your recent
response.
If this is so, then de-normalization in data warehousing
environments may still be necessary, unless perhaps you are using TeraData
which seems to be a more intelligent architecture.
From: Fabian Pascal
You mean DBMSs, not databases. Yup.
The question is are you aware of the price you pay for
denormalization? And that you are biasing some applications against others? If
you do and are willing to consciously accept it, fine. Problem is, most
practitioners are oblivious to this. Those who invented warehouses certainly
were. They did not understand data fundamentals, part. the difference between
databases and application views thereof.
People should stop talking about denormalization for performance and focus on
product implementation improvements. Until and unless they do, they will continue
to get poor implementations.
I strongly recommend reading my forthcoming paper.
From: MA
I understand your point about DBMS (not database
:) implementations.
As I said, I've come across this in my research and
readings. I will certainly add your paper to my reading list.
I do not want to become complacent in my thinking and appreciate new angles
that challenge my conceptions. This is why reading articles and books by
people, such as yourself, is so important.
I must say that there is a mountain of legacy thinking to
overcome in your mission. These habits die hard.
As it is, BI tools are moving toward these goals in that the
labor and time that has been used with reconciliation models (staging
models) and movement of data to a de-normalized relational structures or
star schemas is becoming unnecessary and outdated. Soon
real-time requirements and advanced tools, will move this, perhaps to the
operational database or a fully normalized integrated reporting database and perhaps to
the utilization of XML queries instead of SQL.
It's exciting. Keep up the good work.
From: Fabian Pascal
Good for you.
Matters are becoming worse, not better. It's no longer habits, but ignorance.
There is increasingly more of it. The problem is that the system does not
reward, and actually punishes critical independent thinking and foundation
knowledge. So they produce more ignorance over time.
Permit me to be skeptical. XML is an excellent example of
ignorance and regression by decades. You should read my articles on the subject
(they're free). There is also my seminar called: XML-The Exchange Tail and
the Management Dog.
From: MA
In my heart I think you are right about XML. I will add
this article to my list. The reason I have given this thought is that it
is very popular with developers who have embraced this in a very large way and
espouse it as the Holy Grail. I always like to give ideas consideration
and keep an open mind. In addition, I like to stay employed and
want to find a way to provide some value in the work I do. This involves
accepting certain realities both technical and political.
Which brings me to another point. One of the
biggest liabilities to optimal data architecture (I'm a data architect) is
application development, which doesn't even think data architecture is
necessary or many times even knows what it is!
The battle here extends to the DBAs that work in these
environments who have developed very bad habits in terms of both DBMS
implementation and database design. This includes so much
redundancy and confusion, that it cannot be possible for them to
be providing their client base with repeatable, predictable and verifiable
results. This problem is very deep-seated, very prevalent and very
disheartening at times. Companies succeed in spite of what
they do and many times have very little rigor, structure, and
measurability in their approach to building systems and poor
methodology.
I won't go into too much detail, but the client with whom I
am working is mired in legacy thinking (not to mention legacy systems).
They have no concept of data architecture, with most applications still
utilizing VSAM files and COBOL programs. Databases they do use have
tables with no referential integrity at all ... hundreds of tables with no
relationship to each other. The relationships are implemented via
Metadata in join matrices that create problems for clients who join too many
levels deep. They counteract this via "combined"
tables" a.k.a. de-normalized tables that are made up of already
de-normalized tables.
Their operational applications are being modernized to
eliminate the VSAM files using the RUP methodology that doesn't think Logical
Models are necessary.
As for XML Schema, I think they CAN be designed correctly as
any schema can, but since this is in the hands of developers and others who do
not understand design, they often are not.
I'll stop now ... I'm ranting a bit....
I have a lot of reading and thinking to do to facilitate
refinement in my evolution as a data architect. You've definitely
provoked a new tangent in my thinking about my work.
Thanks for that!
From: Fabian Pascal
You need to distinguish between merit and industry practices.
It’s one thing to conform for employment reasons, and quite another to defend
industry fads on merit. Keeping an open mind is not enough without knowledge
of fundamentals, which is what the term usually means in the industry. It’s
those without knowledge that push these fads that often are just rehashed past
failures they're ignorant of. If you read my articles about XML you will see
what I mean.
I've written about this problem extensively. It's becoming
worse and worse due to the increasing ignorance I am referring to. Everything
is done by programmers these days who know nothing but some programming
language and want to solve everything with it.
You are preaching to the choir. I've been deploring this for
years. But then, what do I know, I am not a programmer.
My writings are chockfull of such examples. Lost cause.
So what else is new? As to XML, how can a data exchange
delimited format (not even syntax) be used for data management (semantics)?
That's OK, been there, done that. I am probably older than you at this and
don't rant anymore.
Excellent. That's the whole point.
Posted
05/21/04