ON “MULTIVALUE” TECHNOLOGY
with Fabian Pascal

 

 

 

This exchange followed Steve VanArsdale’s attempt at rebutting my DMReview article The Dangerous Illusion: Normalization, Performance and Integrity, Parts 1 and 2. Both VanArsdale’s response and another one by Bob Lambert were published in the DMReview newsletter and my replies to them are forthcoming there. I will provide links to them when they are posted.

 

 

From: Steve VanArsdale

To: Editor

 

Let me begin by declaring myself an admirer.  Your views on data base fallacy and shallow marketing is both entertaining and rewarding.  I agree with much of your commentary; it is illuminating.  And like others, I take your observations seriously. Sufficiently to wish to enlighten you in a small way.

 

Attached is an article directed at your recent comments in DM Review. 

 

 

From: Fabian Pascal

To: SV

 

In your attempt to defend denormalization I think you missed my point. I don't have much to say that was not already addressed in the article.

 

If you intend to publish your comments, be advised that they are quite weak. If you do, I may be compelled to rebut and expose the weaknesses. In the end that would be an unproductive use of our time.

 

If you feel strongly about it, I would prefer you call me for clarifications, that's less tedious than having write the same arguments all over again.

 

 

From: SV

 

I am quite surprised at your threats, Mr. Pascal.  My comments echoed your point; physical implementation is the source of the poor performance in the normalized data model.  However, optimization of the ad hoc query is obviously not the solution, and your comments at the end of your second article (quoted in the next paragraph) suggest that you agree.  My article proposes an alternate data model, one that has been a commercial success for many years. I first discovered it while working for an insurance company twenty-five years ago, and learned that it represents an elegant solution to the normalization performance issue by successfully denormalizing the physical model with a near-infinite capacity for repeating groups under a single key, while providing logical results in any of the five-normal forms.    

 

You can expect to see my article in print, Mr. Pascal, but there is precisely nothing for you to rebut.  You wrote that an implementation "truer to the relational concept would provide a more complete separation between the logical and physical levels. In fact, technology that facilitates just such implementations has recently been developed."  I reminded you of just such a technology, and would be interested in learning to what technology you refer.  I would not want you to write the same arguments over again, nor do I have the time to engage in a telephone debate.  The purpose here is our mutual education.  If you have specific questions about the mechanics of the multi-value data model, then I will humbly attempt to explain to your satisfaction.  Or I would expect you to contact one of the vendors, notably the IBM "U2" Data Management Unit, or Raining Data Inc. for further information.   Otherwise, feel free to enlighten me as to the weaknesses of my article. 

 

 

From: Fabian Pascal

 

And I am surprised that you found my message "threatening". I was just telling you what I really thought and made a friendly offer to explain the weaknesses in order to save both of us time. I usually say what I think and I don't take criticisms personally. Those who cannot should not involve themselves in intellectual arguments.

 

Whether the points support me or not is irrelevant to whether they are strong or weak. There are many instances where people seem to agree with me, but for the wrong reasons; or they do not agree, but think they do.

 

I do not understand what you mean--solution to what?

 

You have replaced the relational data model with another? Wow. What did you substitute for predicate logic and set theory? (Hint: you are misusing the term data model).

 

Normalization is a purely logical concept, so what does "normalization at the physical level" mean? Repeating groups are logical too and a violation of relational principles.  The sheer fact that you need to resort to repeating groups indicates problems. [Ed. Note: Date and Darwen now believe that R-table columns can contain R-table values--another way of saying nested relations (see Relation-Valued Attributes or Will the Real First Normal Form Please Stand Up? in their RELATIONAL DATABASE WRITINGS 1989-1991. They are desirable on rare occasions, but note very carefully that they are not what VanArsdale multivalue concept is about.]

 

Ah, so you decided that. How convenient, but I'm afraid I'll be the judge of that. By all means, publish if you so desire. I am willing to believe that you will even find quite a few people who will like your "solution", whatever it is. But that's not necessarily because you are right, only because they don’t know any better.

 

My reference is to a true technology that does not replace relational – it is a formal model that sits between the logical R level and the physical level, based on theory. Do you have anything like this? If so, that is not clear from your claims. You are implying a replacement of the data model in a way that violates it.

 

The multivalue approach has been around forever and Codd declared it to be in violation of relational principles years ago. It is neither new, nor a replacement of RM, nor a solution to anything. And if the "solution" is physical, then it has nothing to do with the data model, or repeating groups. [Ed. Note: DBMS implementers can do whatever they darn please at the physical level to optimize performance, as long as they do not expose it to users in applications – that’s one of the main points of relational technology.]

 

I don't have time for this neither, but I did offer you to call me precisely because I wanted to educate you. It's your decision whether to take advantage of it or not.

 

 

From: SV

 

Thank you for your time, and your comments.  I regret that it is not likely that we are going to find a suitable plateau to compare our points of view; mine are obviously mired in practice while yours are clearly more theoretical.  While we might debate my use of the term data model, I am certainly advocating a replacement, but for the two-dimensional physical database architecture used in the fashionable, so-called relational data bases.

 

I doubt that multi-value requires replacement of your predicate logic or set theory, just the obsolete physical constraints of attempting to force multi-dimensional real data into artificial columns-and-rows, thereby embedding in the physical layer the architecture of the logical (and as you implied, giving rise to the normalization/de-normalization debate in the first place, when should be purely a physical implementation issue).  Codd didn't reject the multi-value repeating group model; I contend that he simply had no perception of how to implement it.

 

If, as you say, a physical mechanism for the economical and efficient management of naturally-occurring repeating groups violates the logical rules governing the relational data base, then I share your opinion that we have no common ground on which to discuss.

 

 

From: Fabian Pascal

 

Mired is the right word for your POV. And the theory is there for practical reasons. It is when people deviate from it that they get mired in all sorts of "solutions" which are hardly practical. Yours is POV, mine is science. Sorry, there cannot be any debate between them.

 

Another example of lack of knowledge and understanding--you're mired in confusing logical and physical. How can a "physical architecture" be more than two-dimensional????? And if it's that which you replace, then what does it have to do with the relational model, [whose tables are] n-dimensional?

 

Mired in more confusion. To repeat: if it's a physical solution, what does it have to do with the data model?? This is clearly the kind of exchange I wanted to avoid, where it's impossible to communicate because your thinking is so corroded by the industry and practice.

 

There is no such thing as "naturally occurring repeating groups"--it's a matter of representation and you can choose to represent it with repeating groups, or relationally, without. The former was tried and failed, the latter has not yet been tried properly [Ed. Note: SQL is not it], but even so, it's much better. However, none of this has anything to do with physical implementation or performance, it's purely logical.

 

Despite your attempts to keep the logical and physical separate, you are actually confusing them and it is that which led you to your way of thinking and all the flaws in it.

 

 

From: SV

 

I have to concede; my thinking is thinking is so corroded by the industry and practice.

 

By the way, this is a "PHYSICAL architecture" be more than 2-dimensional

 

name^phone1]phone2

 

and this

 

name^phone1]phone2^areacode1]areacode2

 

When areacode1 is the key to a dimension table...

 

areacode1^zip1

areacode2^zip2]zip3

 

and zip is the key to another...

 

zip1^city1^state1

zip2^city2^state2

zip3^city3]city4^state3

 

...then this is logically n-dimensioned relational

 

name areacode1 phone1 city1 state1 zip1

name areacode2 phone2 city2 state2 zip2

name areacode2 phone2 city3 state3 zip3

name areacode2 phone2 city4 state3 zip3

 

I suggest that eventually it is time to take a dose of industry practice, Mr. Pascal, if you're going to retreat behind the baffles of the ivory tower.

 

 

From: Fabian Pascal

 

Yes, but you don't realize how much and how that muddles your thinking, so you don't see the problems in your arguments.

 

I am not sure what you mean by physical anymore, but obviously the examples you give are logical. Again, you're utterly confused about that.

 

Tables are n-dimensional and dimensions have nothing to do with physical.

 

Well, I have quite a lot of practice and an education and clear thinking, which permits me to figure out where the practice is leading the wrong way. You, OTOH, have only practice and, therefore, can't see the problems. So my suggestion is that you need to get what I have and you don't.

 

 

From: SV

 

Just one last question, Mr. Pascal.

 

As I read back through the messages that you've sent today, I am compelled to ask: will all your points involve your perception of my weaknesses and your superiority? If this is your preferred style then we will not continue to waste your time, as you undoubtedly have more important things to do.

 

If in the future you ever choose to focus on the issues I've raised rather than your superiority to them, you may eventually understand that the examples that I sent you were physical record layouts, made multi-dimensional by how they are retrieved and handled by the data base implementation software code   It is only the limitation of current disk cylinder/track/and sector which gives these n-dimensional physical layer records their left-to-right appearance.

 

It is the fact that they are irrevocably separate and isolated from the logical layer that is the fundamental common point between your theoretical observations and the practical solution that I offered you.  Look again and you see that the only logical layout in those gifts of understanding that I gave you was the last one.

 

 

From: Fabian Pascal

 

So you are dismissing my arguments as "just theory" and "no practical experience" and you have the nerve to talk to me of superiority claims? I am sorry, but that is your perception, not mine. Sort of validates my claim that those who take substantive criticism personally do not have sufficient self-assurance.

 

You are trying to persuade me to "educate" myself on your "solution" with arguments that do not hold water. Since this is rather common--I get this almost continuously--it is not a productive use of my time to continue to communicate with you when you obviously do not have the necessary knowledge and understanding to comprehend. So when I say so it is a statement of fact, not a declaration of superiority. To the extent that there is any superiority here, it is not mine, but that of the scientific approach over confusion, misuse of terms and lack of knowledge. Those I have proven to you, but you refuse to accept it. You are essentially responding emotionally to criticism.

 

That may be so, but you have not addressed the main issue that I have repeated several times: if this is all physical, why do you need to have another data model--which is logical--and what do repeating groups and normalization, which are also logical, have to do with that?

 

A data model, including RM, has absolutely nothing to say about the physical level. A DBMS maker can do whatever he damn pleases at the physical level to maximize performance, as long as he does not expose it to users and applications. As long as you do not comprehend this--which is very explicitly stated in my article--there is no point in any exchange between us.

 

 

Posted 09/20/02

 

 

 

[ABOUT] [QUOTES] [LINKS]