From: GM
To: Editor
Let me take issue with two points (there are many more that I
could address, but these two will do) in your response to the articles of Steve
VanArsdale and Bob Lambert:
1.
“… denormalization introduces redundancy, which must be
controlled to avoid corruption; control is via additional integrity
constraints, which must be declared to, and checked and enforced by the DBMS;
enforcing those constraints requires the very joins that denormalization is
supposed to avoid for performance reasons in the first place.”
Denormalization into a multi-value structure does not
necessarily introduce redundancy -- why should it? If you start, as
practitioners of MV database design would generally do, with a data model in at
least 3NF, why should the fact that one table is stored within another
introduce redundancy? The advantage of
a MV structure is precisely that is allows you to demoralize [sic] without
introducing redundancy. It does
introduce dependencies upon field structure, but the DBMS is designed to take
care of that. Where it then saves on
joins is that if I want to retrieve information--for example, in a sales
application--on a customer order I can get the order number, customer number
and the line item data (e.g. product codes, quantities, prices charged, etc) in
one read. I don't have to store the
product names and a whole lot of other product-related information against each
line item -- if necessary I can link to them from the product table. [Ed. Note: Do you mean join, by any
chance?]
The valid criticism of the MV structure is that the
flexibility that it provides means that integrity control generally has to be
done at the application level rather than the DBMS level -- however, in my
experience this is easily managed.
2.
“I do not know how long Lambert has been in this business,
maybe not long enough to know about the history of VanArsdale’s [as well as
other past "solutions", see Those
Who Forget or Don’t Know the Past, Are Doomed Repeat It.] Multi-value
technology was discarded in the past precisely because it was less usable than
relational technology (I explain in my books why).”
Discarded? Well,
there is a fair swag of applications still out there in the real world running
happily on Pick/Universe/Unidata/etc., and the products are still sold and
supported. Less usable? Some of us who have worked, as I have in a
22-year career, with both types of product would argue strongly that the
ability to use a multi-value structure where appropriate allows us to provide a
much more efficient solution. You might
also want to look -- as I have had occasion to do -- at the size of box
required to run a Universe application supporting a medium-sized company
compared to the size of box required to run an Oracle-based application to do
the same job -- if relational technology is so wonderfully efficient, why does
it need about ten times as much disc storage and machine power?
Fabian Pascal Responds: What 1NF really means is somewhat tricky and there
is
considerable confusion about it. We are working on a paper on the subject,
which will probably be the first in our DATABASE FOUNDATIONS SERIES,
intended to produce statements and clarification on the state of database
science, and its practical implications. For the purposes of this discussion:
My article is about redundancy in R-tables (formally,
relvars) that (a) are in 1NF (or normalized) by definition, but (b) are not
further normalized to the fullest. The so-called MVDBMSs do not treat their
data as R-tables and, therefore, they are outside the scope of my article.
Furthermore, in my first and second books I explained why a MV structure
complicates data management without providing any benefits and, in fact, you
provide evidence of this very problem, without realizing it: That “integrity
control generally has to be done at the application level rather than the DBMS
level” is a direct consequence of the (demoralizing, indeed) complexity that
the MV structure forces on integrity control and manipulation, which makes is
very difficult to implement this in the DBMS (that is, in fact, true for all
structures with a hierarchic bent, MV and XML included; in this sense, such
products are not really DBMSs.)
MVDBMS proponents are oblivious to the complications because
almost always they think and talk about structure, and ignore its raison
d’etre, integrity and manipulation, which is where the complications occur.
Technologies and products, particularly complex ones that
require a great deal of investment never die completely. Why, we still have
IMS, IDMS and Focus databases/DBMSs out there, would you consider those “not
discarded”, or up to date? There are always practitioners and institutions too
vested in them and unable/unwilling to learn/appreciate/digest the fundamentals
of the field and new developments. Your arguments are an excellent example. But
that does not negate technological obsolescence. By “discarded” I did not mean
to say they are not used at all; rather, I meant that it is generally known and
understood why they are not the best way to do data management, and are just
leftovers from the past.
The whole point of my article to which VanArsdale and Lambert
responded, was to help practitioners avoid falling into the
logical/physical--model/implementation confusion trap, rampant in the industry.
But because MVDBMS technology is itself predicated on this confusion, it
instills it in its proponents to the degree that they do not overcome it even
when it is made obvious to them.
If a DBMS is inefficient, or consumes large amounts of disk
space, that has nothing to do with the data model employed for logical data
representation, and everything to do with the product’s physical
implementation. But I will go even further and argue that even if the
relational model did require more storage space and computing power, it would
still be preferable to nonrelational products, because it optimizes the human
resource, which is much costlier than hardware.
Posted
02/28/03
[ABOUT]
[QUOTES]
[LINKS]