This exchange followed Steve
VanArsdale’s attempt at rebutting my DMReview article The Dangerous
Illusion: Normalization, Performance and Integrity, Parts 1 and 2. Both
VanArsdale’s response and another one by Bob Lambert were published in the
DMReview newsletter and my replies to them are forthcoming there. I will
provide links to them when they are posted.
From: Steve VanArsdale
To: Editor
Let me begin by declaring myself an admirer. Your
views on data base fallacy and shallow marketing is both
entertaining and rewarding. I agree
with much of your commentary; it is illuminating. And like others, I take your observations
seriously. Sufficiently
to wish to enlighten you in a small way.
Attached is an article directed at your recent comments
in DM
Review.
From: Fabian Pascal
To: SV
In your attempt to defend
denormalization I think you missed my point. I don't have much to say that was
not already addressed in the article.
If you intend to publish your
comments, be advised that they are quite weak. If you do, I may be compelled to
rebut and expose the weaknesses. In the end that would be an unproductive use
of our time.
If you feel strongly about it, I
would prefer you call me for clarifications, that's less tedious than having
write the same arguments all over again.
From: SV
I am quite surprised at your threats, Mr. Pascal. My
comments echoed your point; physical
implementation is the source of the poor performance in the normalized data
model. However, optimization of the ad
hoc query is obviously not the solution, and your comments at the end of your
second article (quoted in the next paragraph) suggest that you agree. My article proposes an
alternate data model,
one that has been a commercial success for many years. I first discovered it
while working for an insurance company twenty-five years ago, and learned that
it represents an elegant solution to the normalization performance issue by
successfully denormalizing the physical model with a near-infinite capacity for
repeating groups under a single key, while providing logical results in any of
the five-normal forms.
You can expect to see my article in print, Mr. Pascal,
but there is
precisely nothing for you to rebut. You
wrote that an implementation "truer to the relational concept would
provide a more complete separation between the logical and physical levels. In
fact, technology that facilitates just such implementations has recently been
developed." I reminded you of just
such a technology, and would be interested in learning to what technology you
refer. I would not want you to write
the same arguments over again, nor do I have the time to engage in a telephone
debate. The purpose here is our mutual
education. If you have specific
questions about the mechanics of the multi-value data model, then I will humbly
attempt to explain to your satisfaction.
Or I would expect you to contact one of the vendors, notably the IBM
"U2" Data Management Unit, or Raining Data Inc. for further
information. Otherwise, feel free to
enlighten me as to the weaknesses of my article.
From: Fabian Pascal
And I am surprised that you found my message
"threatening". I
was just telling you what I really thought and made a friendly offer to explain
the weaknesses in order to save both of us time. I usually say what I think and
I don't take criticisms personally. Those who cannot should not involve
themselves in intellectual arguments.
Whether the points support me or not is irrelevant to
whether they are
strong or weak. There are many instances where people seem to agree with me,
but for the wrong reasons; or they do not agree, but think they do.
I do not understand what you mean--solution to
what?
You have replaced the relational data model with
another? Wow. What did
you substitute for predicate logic and set theory? (Hint: you are misusing the
term data model).
Normalization is a purely logical concept, so what does
"normalization at the physical level" mean? Repeating groups are
logical too and a violation of relational principles. The sheer fact that you need to resort to
repeating groups
indicates problems. [Ed. Note: Date
and Darwen now believe that R-table columns can contain R-table values--another
way of saying nested relations (see Relation-Valued Attributes or
Will the Real First Normal Form Please Stand Up? in their RELATIONAL DATABASE
WRITINGS 1989-1991. They are desirable on rare occasions, but note very
carefully that they are not what VanArsdale multivalue concept is
about.]
Ah, so you decided that. How convenient, but I'm afraid
I'll be the
judge of that. By all means, publish if you so desire. I am willing to believe
that you will even find quite a few people who will like your
"solution", whatever it is. But that's not necessarily because you
are right, only because they don’t know any better.
My reference is to a true technology that does not
replace relational
– it is a formal model that sits between the logical R level and the
physical level, based on theory. Do you have anything like this?
If so, that is not clear from your claims. You are implying a replacement of
the data model in a way that violates it.
The multivalue approach has been around forever and
Codd declared it to
be in violation of relational principles years ago. It is neither new, nor a
replacement of RM, nor a solution to anything. And if the "solution"
is physical, then it has nothing to do with the data model, or repeating
groups. [Ed. Note: DBMS implementers
can do whatever they darn please at the physical level to optimize performance,
as long as they do not expose it to users in applications – that’s one of the
main points of relational technology.]
I don't have time for this neither, but I did offer you
to call me
precisely because I wanted to educate you. It's your decision whether to take
advantage of it or not.
From: SV
Thank you for your time, and your
comments. I regret that it is not
likely that we are going to find a suitable plateau to compare our points of
view; mine are obviously mired in practice while yours are clearly more
theoretical. While we might debate my
use of the term data model, I am certainly advocating a replacement, but for
the two-dimensional physical database architecture used in the fashionable,
so-called relational data bases.
I doubt that multi-value requires replacement
of your predicate logic or set theory, just the obsolete physical constraints
of attempting to force multi-dimensional real data into artificial
columns-and-rows, thereby embedding in the physical layer the architecture of
the logical (and as you implied, giving rise to the
normalization/de-normalization debate in the first place, when should be purely
a physical implementation issue). Codd
didn't reject the multi-value repeating group model; I contend that he simply
had no perception of how to implement it.
If, as you say, a physical mechanism for the
economical and efficient management of naturally-occurring repeating groups
violates the logical rules governing the relational data base, then I share
your opinion that we have no common ground on which to discuss.
From: Fabian Pascal
Mired is the right word for your POV. And the theory is
there for practical
reasons. It is when people deviate from it that they get mired in all sorts of
"solutions" which are hardly practical. Yours is POV, mine is science.
Sorry, there cannot be any debate between them.
Another example of lack of knowledge and
understanding--you're mired in
confusing logical and physical. How can a "physical architecture"
be more than two-dimensional????? And if it's that which you replace,
then what does it have to do with the relational model, [whose tables are]
n-dimensional?
Mired in more confusion. To repeat: if it's a
physical solution,
what does it have to do with the data model?? This is clearly the kind
of exchange I wanted to avoid, where it's impossible to communicate because
your thinking is so corroded by the industry and practice.
There is no such thing as "naturally occurring
repeating
groups"--it's a matter of representation and you can choose
to represent it with repeating groups, or relationally, without. The former was
tried and failed, the latter has not yet been tried properly [Ed. Note: SQL is not it], but even so, it's
much better. However, none of this has anything to do with physical
implementation or performance, it's purely logical.
Despite your attempts to keep the logical and physical
separate, you are
actually confusing them and it is that which led you to your way of thinking
and all the flaws in it.
From: SV
I have to concede; my thinking is thinking
is so corroded by the industry and practice.
By the way, this is a "PHYSICAL
architecture" be more than 2-dimensional
name^phone1]phone2
and this
name^phone1]phone2^areacode1]areacode2
When areacode1 is the key to a dimension
table...
areacode1^zip1
areacode2^zip2]zip3
and zip is the key to another...
zip1^city1^state1
zip2^city2^state2
zip3^city3]city4^state3
...then this is logically n-dimensioned
relational
name areacode1 phone1 city1 state1 zip1
name areacode2 phone2 city2 state2 zip2
name areacode2 phone2 city3 state3 zip3
name areacode2 phone2 city4 state3 zip3
I suggest that eventually it is time to take
a dose of industry practice, Mr. Pascal, if you're going to retreat behind the
baffles of the ivory tower.
From: Fabian Pascal
Yes, but you don't realize how much and how that
muddles your thinking,
so you don't see the problems in your arguments.
I am not sure what you mean by physical anymore, but
obviously the
examples you give are logical. Again, you're utterly confused about that.
Tables are n-dimensional and dimensions have nothing to
do with
physical.
Well, I have quite a lot of
practice and an education and clear thinking, which permits me to
figure out where the practice is leading the wrong way. You, OTOH, have only
practice and, therefore, can't see the problems. So my suggestion is that you
need to get what I have and you don't.
From: SV
Just one last question, Mr. Pascal.
As I read back through the messages that
you've sent today, I am compelled to ask: will all your points involve your
perception of my weaknesses and your superiority? If this is your preferred
style then we will not continue to waste your time, as you undoubtedly have
more important things to do.
If in the future you ever choose to focus on
the issues I've raised rather than your superiority to them, you may eventually
understand that the examples that I sent you were physical record layouts, made
multi-dimensional by how they are retrieved and handled by the data base
implementation software code It is
only the limitation of current disk cylinder/track/and sector which gives these
n-dimensional physical layer records their left-to-right appearance.
It is the fact that they are irrevocably
separate and isolated from the logical layer that is the fundamental common
point between your theoretical observations and the practical solution that I
offered you. Look again and you see
that the only logical layout in those gifts of understanding that I gave
you was the last one.
From: Fabian Pascal
So you are dismissing my arguments as "just
theory" and
"no practical experience" and you have the nerve to talk to me of
superiority claims? I am sorry, but that is your perception, not mine.
Sort of validates my claim that those who take substantive criticism personally
do not have sufficient self-assurance.
You are trying to persuade me to "educate"
myself on your
"solution" with arguments that do not hold water. Since this is
rather common--I get this almost continuously--it is not a productive use of my
time to continue to communicate with you when you obviously do not have the
necessary knowledge and understanding to comprehend. So when I say so it is a
statement of fact, not a declaration of superiority. To the extent that there
is any superiority here, it is not mine, but that of the scientific approach
over confusion, misuse of terms and lack of knowledge. Those I have proven to
you, but you refuse to accept it. You are essentially responding emotionally to
criticism.
That may be so, but you have not addressed the main
issue that I have
repeated several times: if this is all physical, why do you need to have
another data model--which is logical--and what do repeating groups and
normalization, which are also logical, have to do with that?
A data model, including RM, has absolutely
nothing to say about
the physical level. A DBMS maker can do whatever he damn pleases at the
physical level to maximize performance, as long as he does not expose it to
users and applications. As long as you do not comprehend this--which is very
explicitly stated in my article--there is no point in any exchange between us.
Posted
09/20/02
[ABOUT]
[QUOTES]
[LINKS]