ON VAGUENESS, POOR REASONING, AND POINTS MISSED
with Fabian Pascal

 

 

 

“If liberty means anything at all, it means the right to tell people what they do not want to hear."

--George Orwell

 

From: AC

To: Editor

 

I recently found the website dbdebunk.com, and I'm riveted ... having worked as a DBMS expert for years, I've found some of your insights quite, well, rather insightful.

 

On the other hand, I have to take issue with the article The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1. You seem to have fallen victim to a bugbear of your own devising: the alleged Physical vs. Logical confusion! But wait. Don't jump to conclusions. I'll explain.

 

Yes, there is a body of developing knowledge called "Relational Theory". But equally, there is a body of developing knowledge called "Designing Databases to Meet User Requirements". Sometimes these two fundamentally different fields converge, at other times they do not. How and why is this sorry state of affairs possible? Well, here's my take on it. There is a gaping hole in Relational Theory. Not *intrinsically*, but rather in its application outside of the field of, well, Database Theory. The gaping hole is the notion of efficiency, or more specifically, the idea of 'Cost'... and the notion of Cost as distinct from the details of any idea of physical implementation.

 

Relational Theory purports to be applicable to the design of real-world databases... but it is not. By failing to adequately address Cost, the Theory becomes inapplicable to real world environments where Cost is the single overriding factor! This notion of Cost is not a confusion of Physical vs. Logical: rather, this is a function, directly, of the logical model. Costs need not be rooted in any physical implementation. We might simply allow that a the cost of a particular join is some function of the number of rows of the relevant tables, perhaps, or some function of the number of tables involved, or an arbitrary constant, or anything. The point is, we have a metric, and we can build a usage model for our database. By modeling usage of the database, the possible growth of tables, the various data volumes, etc, we can construct a purely logical construct of a database in use. The set of metrics and functions being used to define Costs *could* be matched to the observed physical properties of any particular implementation of a database, but it might not. In fact, the opposite might be true: the availability of a cost-based logical model could be used to define the development goals of physical implementations!

 

If Relational Theory is intended to be a complete tool for the design of databases (and not just one of several tools), it *must* address the notion of Cost. You don't address this issue... instead, you specifically *avoid* it, instead suggesting that any mention of Cost by denormalization apologists is actually the work of a deficient intellect incorrectly separating logical from physical!

 

Suffice to say that any discussion of the relative merits of denormalizing vs. normalizing has to be quantitative with regards to Cost, in addition to qualitative. Why? Because if Cost is not a factor, then we might as well not normalize at all! We normalize a data set because there are Costs associated with *not* normalizing. But these costs remain vague, implied things. For example, on one hand, you defend the implementation of fully normalized databases, suggesting that it is possible that faulty programmatic logic would allow data integrity issues into a denormalized database--a very low level, physical-level consideration of code quality and the related costs of data integrity issues--while on the other hand, you disallow any possible benefit from denormalization, based on the idea that any consideration of physical implementation with regards to normalization is due to faulty reasoning.

 

Thoughts?

 

 

From: Fabian Pascal

To: AC

 

When you have to take so long to explain something so simple, it is obviously wrong. I am sorry, but that is the case. I think you either worked too long in the industry, or do not have a formal education, or both. And there is little I can do via email to correct that.

 

Let me just say that you do not understand what theory is, and your perspective on the article is off.

 

 

From: AC

 

I'm surprised that you seem so hostile. Certainly your use of argumentum ad hominem might appear a little out of place in a technical discussion?

 

I'll phrase the argument a couple of different ways, short and sweet:

 

  1. Since the only rational reason for normalization is cost management--costs being measured in terms of the risk of data integrity issues or in terms of some sort of theoretical computational operation required to maintain integrity in a denormalized database--a complete theory for the design of a database must address relative cost.

 

  1. A database design theory that fails to address the relative merits of two different database designs--indicating that both are "correct", when in fact, given a particular cost function, one or other is better--is clearly incomplete.

 

"...take so long to explain something so simple, it is obviously wrong...": You're such a comedian... at least my argument doesn't run to two *articles*, my friend. ;)

 

Anyway, keep up the good work!

 

 

From: Fabian Pascal

To: AC

 

1st, I have no idea where you find hostility. My reply that you are wrong was factual, not hostile.

 

2nd, I have heard your kind of argument for as long as I've been in the field and all my writings debunk it, so my patience for it has faded over the years. As Chris Date says, life is short and there's a limit to how many times we can respond to the same error that we explained so many times before, and with utmost clarity.

 

3rd, this is especially true for long messages, which require even longer replies. This is a free endeavor and our time is not unlimited.

 

4th, your argument is not technical, but one of what you call "cost-effectiveness" (see below).

 

[Your new phrasing] may be short, but it is certainly not sweet. Why is a denormalized database more costly than a normalized one? Do you include the cost of corruption in your calculations? If so, how do you calculate it? (Frankly, I don't believe you were even aware of the integrity cost before you read my article; 99% of practitioners are not.)

 

The reason for which people denormalize is not due to cost, but to ignorance of the integrity implications. Neither do they know/understand that the products are flawed and force them to denormalize and, thus, trade-off performance for integrity, which would be unnecessary with a well implemented, truly relational DBMS

 

This is pure BS--it sounds reasonable, but is too vague and lacks any justification to mean anything. Correct means logically correct, not economically effective. Db practitioners work in a field founded on logic, of which they know appallingly little, and can't even get their concepts and terms right, let alone reason. Even the concept of theory you don't understand, they equate it, like you, with lack of practicality. Can you define theory?

 

I reiterate: how do you factor in the cost of corruption, if you do at all (which I don't believe)? My bet is that you were unaware of the integrity implications until you read my article. If so, than you are inconsistent with your own framework of "cost function". If anything is incomplete, it's that.

 

Actually, it is you who try to be a comedian, but fail: do you expect to be taken seriously with such comments?

 

Here's my suggestion: publish your message as an article and make it available to public review as I did, then we'll see if my article is too long, or yours make sense. But before you do, I suggest you check out my rebuttals to two other reactions to my article, and see if you want to expose yourself to that.

 

 

From: AC

 

Thanks for your reply. In understand that you do not suffer fools gladly. I actually find healthy debate very stimulating. ;)

 

(1)  Since normalization, one tool suggested by Relational Theory, does not address every possible flaw in a proposed schema--for example, intra-table dependencies--it can hardly be suggested that a schema that achieves a certain normal form is "correct", nor can it be claimed that normalization is "complete" in any sense.

(2)  If you truly do not care about economic effectiveness, why do you bother talking to people for whom this is the overriding concern, that is, professionals?

 

I'll spend some more time working through more of the material on your excellent site.

 

 

From: Fabian Pascal

 

Well, there is little one can do about fools, so there’s no point investing time and effort. What I don't suffer is laziness: there are dozens of my articles, a web site and 3 books which already addressed repeatedly the issues people keep raising for me to answer over and over again. And if they think that the issues have not been addressed, well, then they don't understand and there is little I can do about that too.

 

Key in your statement are debate and healthy: there must be a debate and it must be healthy. In most cases neither is true, and that includes your arguments.

 

1st, normalization is not a tool, but a set of design principles/guidelines.

 

2nd, let me get this straight: are you saying that because normalization does not address every possible design flaw, then it should not address any of those that it can address? And if relational design addresses intra-table dependencies separately from normalization, then normalization should not be used to address those dependencies that it can address?

 

You've got the common problem of poor reasoning. How people who cannot think logically can do good work in a field founded on logic escapes me. It's like a quantum theorist who does not know mathematics.

 

3rd, you avoid answering my questions by asking another. To repeat:

 

·   Why is denormalization less costly than normalization?

·   Is corrupt data a cost of denormalization and does it enter in your cost-effectiveness calculations at all? If so, how?

 

Answering this question would make your question (2) moot - more evidence of poor reasoning.

 

It says something about the quality of discourse (and intellectual abilities) in the IT world that all reactions to my article completely ignore the core of its message: the integrity cost of denormalization. And you call this "healthy debate that stimulates"?

 

Sounds like you need much more than [reading my site].

 

 

From: AC

 

I'll be frank: I think you're dragging this debate all over the place. My original question was simple... where does cost come into the database design picture. In your article you position normalization as the be-all-and-end-all of database design, when in fact, I posit, *efficiency* is the be-all-and-end-all. Normalization is simply a tool for achieving efficiency in a database that is prone to random access update. Yes, it is very satisfying in it's application; yes, I'm sure a lot of your correspondents are morons, but no, and no again, normalization does not address cost, and so cannot be considered a complete methodology for database design.

 

I will not be drawn into defending denormalization, I don't advocate it, and I am not its apologist. I'm merely suggesting that there is more work in this area that still needs to be done to address cost, in all its forms, at the logical level.

 

Let me briefly change the subject. I greatly admire your work and your apparent goals, but you, after all, for all your comedic vitriol, are just as human as those poor sods learning database design from "J2EE for Dummies", and treating these people as less than you is not helping your cause. You see, normalization does not have any problems addressing "intra-table dependencies". In fact, there is no such term. I invented it. No, nothing to do with an improvement suggested by Ling, Tompa, and Kameda. Gotcha. ;)

 

 

From: Fabian Pascal

 

It is now obvious that your problem is more serious than just [lack of] technical knowledge and understanding--you don't seem to have understood anything in my replies and ignored all my questions and challenges. You keep repeating the same vague, meaningless statements and I get the distinct impression that you simply cannot address the problems in your arguments that I raised. No wonder you want to change the subject from technical issues (which you claimed this is all about) to style issues, which are irrelevant to the arguments at hand.

 

As to your gotcha, you have actually put your foot in your mouth. It so happens that there is such a thing as intra-table dependencies--this is in fact what normalization is all about!! The notion that you "invented" the term makes it very clear that you don't understand the subject you purport to argue about. What is more, neither did you catch my own "gotcha" (although I did not intend it to be that): normalization does deal with those dependencies. So you see, you're out of your league.

 

I find that this exchange is--as I suspected from the beginning--neither a debate, nor stimulating, but a waste of time. Should have listened to my gut and never allowed myself yet again to be baited into this kind of nonsense--my instincts are almost always right.

 

I will post the exchange on the site and let the readers decide if there is anything behind your "cost" and "efficiency" mantra, who got whom, and what is being dragged where.

 

 

Ed. Comments:  That a “DBMS expert” with “years of experience” knows so little about the foundation of his field and exhibits such inability to see the logical implications of his arguments is excellent evidence for my take on the sad state of database management. For consider:

 

Ø       Relational theory is not just “a body of developing knowledge” or “purports to be applicable only to the design of real-world databases”. It is the scientific foundation of database management - logic as it applies to databases. Does AC mean that the correctness of databases and the answers they produce - of which logic is the guarantor--is not part of user requirements? How can logic be “fundamentally different” than those requirements? And if it is, what does AC propose as an equivalent/better substitute? AC got it backwards here: the sad story is precisely that database practice diverges from, instead of converging with logic, as it should. That does not reduce costs, it increases them (By correctness I mean, of course, the only kind a DBMS is capable of ensuring, namely consistency, see PRACTICAL ISSUES IN DATABASE MANAGEMENT).

 

Ø       Says AC “Costs need not be rooted in any physical implementation. We might simply allow that a the cost of a particular join is some function of the number of rows of the relevant tables, perhaps, or some function of the number of tables involved, or an arbitrary constant, or anything”. But unless the join is performed - that is, executed physically - what on earth is its so-called “logical cost” and how does the number of tables or rows enter into it? And how can any theory be applicable to anything arbitrary? (AC seems to subscribe to the notion, rather common in the industry, that theory is “just theory and, therefore, not practical”, which is, of course, bunk. Would he advocate building bridges that violate physical theories, to produce “less costly” results? Or medical treatments that violate biochemical theories, for “higher efficiency”? What happens when this is attempted is pretty well known.)

 

Ø       Any “metric” can be devised, the question is what does it measure that is meaningful and valuable? I have no idea what AC’s metric is (at least based on his “explanation” of it, I don’t), but whatever it is, it has little to do with normalization. Once a business (or conceptual) model already exists - which means that the business rules have been fixed - more than one logical model may be possible (a choice of which may or may not have some costs attached to it), but whichever one is chosen, it must be normalized, if data integrity is to be maximized at minimal cost. The point of my article that AC misses is that there is no cost-function that justifies undernormalization of a given logical model.

 

Ø       AC provides no concrete example of a situation where a denormalized design would be less costly, in his sense, than its normalized version. He misses most of the points of my article, among them:

 

·   Whether AC’s vague “logical costs” exist or not, normalization in the “real-world” is performed because of flaws - imaginary or real - in current, mainly SQL products (which are not really relational), that force users to trade integrity for performance, without them being aware of it;

·   The number of logical tables--and, thus, of rows too--per se says nothing about efficiency, which is a property of processing; depending on storage and access, more rows can be processed more efficiently than less rows;

·   In a rebuttal of another critique I give the example of a DBMS physically storing table joins, rather than the “base tables” stored by SQL products, in which case no join needs to be processed. Better still, what about the new implementation technology I have alluded to in several writings, which does away with physical rows altogether?

·   And I do not even mention efficiency trade-offs between queries and updates, or even between queries (again, see PRACTICAL ISSUES IN DATABASE MANAGEMENT).

 

Ø       What a vast majority of practitioners, including AC, fail to understand is that relational technology cuts several major costs that tend to overwhelm all others. Among them, keeping the logical level the users see insulated from the physical implementation level allows DBMS vendors and DBAs complete freedom to do whatever they damn please to maximize (physical) efficiency, without affecting what users and applications see at the logical level, or its integrity! Unlike AC’s vague cost, this is “big time”, clear, measurable cost cutting. AC, in fact, has it backwards: he criticizes relational theory for failure to address cost, when in reality, this is exactly what it achieves by being silent on efficiency!!! Is it the theory’s fault that vendors flout it and, thus, products fail to deliver its benefits, or impose unnecessary costs? What is more, they get away with it precisely because practitioners think like AC does.

 

Ø       Relational theory is not a “tool” for database design, let alone a complete one (see above). In fact, it is more accurate to say that normalization is an adjunct to the relational model, rather than part of it. And my book and articles cover many other design principles (e.g. normalization is only one section of the chapter on redundancy). However, normalization principles are general to all logical designs, regardless of any other principles that are pertinent to them.

 

AC uses terms such as “quantitative vs. qualitative cost” for effect, which obscures the fuzziness of his argument. This is that it lacks any meaning. How about one concrete, proper application of his “cost function” to a real-world design from his vast practical experience, that favored an undernormalized model over the normalized version on purely logical grounds?

 

I will conclude by noting that this is hardly the first time that my response is deemed “hostile”. But saying that an argument is bunk, or vague, or uninformed, in conjunction with evidence to that effect, is a statements of fact. Thinking practitioners should focus on whether it is true or false, rather than allow their emotions to assign hostility to it. If they cannot afford the latter, they should either make sure that they argue from informed, well-reasoned positions, or refrain from public arguments altogether.

 

This is a thankless job, but somebody’s gotta do it.

 

 

Posted 11/23/02