“A
traditional normalized structure cannot and will not outperform a denormalized
star schema, from a DSS perspective. These schemas are designed for speed, not
your typical record style transaction. I have seen and been involved in too
many large RDBM implementations that can absolutely not support the stress that
a denormalized structure can handle. Supporting users increasing needs of
accessing and analyzing information from a normalized schema perspective does
not make sense. Normalized models cannot support ad-hoc users needs to extract
large volumes of records, aggregating the facts and create hierarchies on the
fly. These type of requests puts way to much stress on a model that was
originally developed to support the handling of single record transactions.”
--Practitioner "with 20 years
experience"
From: PV
To: Editor
You don't seem to be getting much response to Quotes of the Week,
so I thought I'd have a go at the prize
Maybe the practitioner knows the truth, but he/she just
missed out a few contextual details [which I have added]. But, I guess it's
more likely that even after 20 years, he/she can't see the wood for the trees.
A traditional
normalized structure cannot and will not outperform a denormalized star schema,
from a DSS perspective [when using any of the current leading RDMBS
systems, that unfortunately do little to distinguish between logical views of
data and physical implementations].
These schemas are
designed for speed [of particular types of access], not your typical record style transaction
[which
will, in general, be slower in a physically denormalized environment]. I have seen and been
involved in too many large
RDBM implementations that can absolutely not support the stress that a
denormalized structure can handle. [However, recent versions of the
leading DBMS systems are have potential to begin change this, as they can
support better logical/physical separations than earlier versions].
Supporting users
increasing needs of accessing and analyzing information from a normalized
schema perspective does not make sense [when the current RDBMS
systems force the logical and physical designs to be identical and so the
pragmatic need to make the system perform defaults the logical design to be the
same as the optimally performing physical design]. Normalized models cannot support ad-hoc users
needs to extract large
volumes of records, aggregating the facts and create hierarchies on the fly
[because the current generation of query tools presuppose an identical logical
& physical 'star schema' implementation, and also decide to do much
aggregation work themselves rather than leaving all processing to the DBMS.
This latter design being a compensation for lack of analytic capabilities of
earlier(ish) RDBMS implementations].
This type of
requests puts way to much stress on a [physical, implementation] model that was originally
developed [with
good reason] to support the handling
of single record transactions. [rather than a mixed work load of
transactions and decision support queries]"
From: Fabian Pascal
To: PV
No, the practitioner does not know the truth, which is clear
from his arguments (as well as from the exchange I had with him). In fact, the
quote is part of his critique of my article on normalization in the Against
the Grain series, which he failed to understand.
The current leading DBMSs are SQL DBMSs, which have too
little to do with the relational model to be called RDBMSs. That is, in
part, why they don't separate (distinguish is not the right term) well
between the logical and physical levels (even though they do a better job of it
than preceding products). To the extent that performance suffers with fully
normalized databases vs. undernormalized ones--which has not been
satisfactorily demonstrated--it's always implementation factors that are
responsible. Logical design--which is what normalization is--cannot
possibly affect performance, which is determined entirely at the physical level.
Therefore, the notion that schemas--which are logical--whether
normalized or not, can be designed for speed--which is determined physically--is
meaningless. Normalized schemas are logically correct designs and any
implementation that fails to perform in support of them has only itself to
blame. The practitioner admits he has seen implementations that were not
able to "handle the stress", but he does not make the correct
inference from that, because, like so many others, he confuses the logical and
physical levels. What is more, the so-called "star" and
"snowflake" schema and the like are ad-hoc and lack the scientific
design guidelines that govern normalization. Commercial DBMSs are based on SQL
will never do a satisfactory performance job and all indicators are that the
trend is towards further bastardization of SQL, not improvements (and article
on this is forthcoming, so stay tuned).
It is not accurate to say that the logical and physical
levels in current products are 'identical'. More appropriate is to say that the
insulation of the former from the latter is not good enough. While this surely
contributes to performance difficulties, there are many other relational
deficiencies and implementation flaws that add to the problem. They have all
been amply documented (see our Books
page).
Decision support data requests may put too much stress on the
physical implementation of the products, but not on the relational
data model, of which normalization is a component; this is, indeed, the main
point. The notion that the "model was developed to support single record
transactions" is simply wrong: the model is set-oriented and was
intended to replace record-at-a-time processing. Besides, records are
physical entities, and the relational model is, intentionally, silent on
physical implementation, precisely to allow DBMS designers to choose whatever
physical means they deem necessary to maximize performance (see my forthcoming
critique of an academic paper on "denormalization for performance").
In fact, a new implementation technology that
radically parts with current ones has become recently available, that offers
performance several magnitudes greater than current SQL DBMSs and many other
data systems, including decision support. No more excuses. A book on the new
technology by Chris Date is in the process of being published and will be
available via this site.
Posted
05/10/02
[ABOUT]
[QUOTES]
[LINKS]