ON “WHAT’S WRONG WITH THIS QUOTE?”
with Fabian Pascal

 

 

 

“A traditional normalized structure cannot and will not outperform a denormalized star schema, from a DSS perspective. These schemas are designed for speed, not your typical record style transaction. I have seen and been involved in too many large RDBM implementations that can absolutely not support the stress that a denormalized structure can handle. Supporting users increasing needs of accessing and analyzing information from a normalized schema perspective does not make sense. Normalized models cannot support ad-hoc users needs to extract large volumes of records, aggregating the facts and create hierarchies on the fly. These type of requests puts way to much stress on a model that was originally developed to support the handling of single record transactions.”

--Practitioner "with 20 years experience"

 

 

From: PV

To: Editor

 

You don't seem to be getting much response to Quotes of the Week, so I thought I'd have a go at the prize

 

Maybe the practitioner knows the truth, but he/she just missed out a few contextual details [which I have added]. But, I guess it's more likely that even after 20 years, he/she can't see the wood for the trees.

 

A traditional normalized structure cannot and will not outperform a denormalized star schema, from a DSS perspective [when using any of the current leading RDMBS systems, that unfortunately do little to distinguish between logical views of data and physical implementations].

 

These schemas are designed for speed [of particular types of access], not your typical record style transaction [which will, in general, be slower in a physically denormalized environment]. I have seen and been involved in too many large RDBM implementations that can absolutely not support the stress that a denormalized structure can handle. [However, recent versions of the leading DBMS systems are have potential to begin change this, as they can support better logical/physical separations than earlier versions].

 

Supporting users increasing needs of accessing and analyzing information from a normalized schema perspective does not make sense [when the current RDBMS systems force the logical and physical designs to be identical and so the pragmatic need to make the system perform defaults the logical design to be the same as the optimally performing physical design]. Normalized models cannot support ad-hoc users needs to extract large volumes of records, aggregating the facts and create hierarchies on the fly [because the current generation of query tools presuppose an identical logical & physical 'star schema' implementation, and also decide to do much aggregation work themselves rather than leaving all processing to the DBMS. This latter design being a compensation for lack of analytic capabilities of earlier(ish) RDBMS implementations].

 

This type of requests puts way to much stress on a [physical, implementation] model that was originally developed [with good reason] to support the handling of single record transactions. [rather than a mixed work load of transactions and decision support queries]"

 

 

From: Fabian Pascal

To: PV

 

No, the practitioner does not know the truth, which is clear from his arguments (as well as from the exchange I had with him). In fact, the quote is part of his critique of my article on normalization in the Against the Grain series, which he failed to understand.

 

The current leading DBMSs are SQL DBMSs, which have too little to do with the relational model to be called RDBMSs. That is, in part, why they don't separate (distinguish is not the right term) well between the logical and physical levels (even though they do a better job of it than preceding products). To the extent that performance suffers with fully normalized databases vs. undernormalized ones--which has not been satisfactorily demonstrated--it's always implementation factors that are responsible. Logical design--which is what normalization is--cannot possibly affect performance, which is determined entirely at the physical level.

 

Therefore, the notion that schemas--which are logical--whether normalized or not, can be designed for speed--which is determined physically--is meaningless. Normalized schemas are logically correct designs and any implementation that fails to perform in support of them has only itself to blame. The practitioner admits he has seen implementations that were not able to "handle the stress", but he does not make the correct inference from that, because, like so many others, he confuses the logical and physical levels. What is more, the so-called "star" and "snowflake" schema and the like are ad-hoc and lack the scientific design guidelines that govern normalization. Commercial DBMSs are based on SQL will never do a satisfactory performance job and all indicators are that the trend is towards further bastardization of SQL, not improvements (and article on this is forthcoming, so stay tuned).

 

It is not accurate to say that the logical and physical levels in current products are 'identical'. More appropriate is to say that the insulation of the former from the latter is not good enough. While this surely contributes to performance difficulties, there are many other relational deficiencies and implementation flaws that add to the problem. They have all been amply documented (see our Books page).

 

Decision support data requests may put too much stress on the physical implementation of the products, but not on the relational data model, of which normalization is a component; this is, indeed, the main point. The notion that the "model was developed to support single record transactions" is simply wrong: the model is set-oriented and was intended to replace record-at-a-time processing. Besides, records are physical entities, and the relational model is, intentionally, silent on physical implementation, precisely to allow DBMS designers to choose whatever physical means they deem necessary to maximize performance (see my forthcoming critique of an academic paper on "denormalization for performance").

 

In fact, a new implementation technology that radically parts with current ones has become recently available, that offers performance several magnitudes greater than current SQL DBMSs and many other data systems, including decision support. No more excuses. A book on the new technology by Chris Date is in the process of being published and will be available via this site.

 

 

Posted 05/10/02

 

 

 

[ABOUT] [QUOTES] [LINKS]