MORE ON DENORMALIZATION, REDUNDANCY AND “MULTIVALUE” DBMS
with Fabian Pascal

 



 

 

From: GM

To: Editor

 

Let me take issue with two points (there are many more that I could address, but these two will do) in your response to the articles of Steve VanArsdale and Bob Lambert:

 

1.      “… denormalization introduces redundancy, which must be controlled to avoid corruption; control is via additional integrity constraints, which must be declared to, and checked and enforced by the DBMS; enforcing those constraints requires the very joins that denormalization is supposed to avoid for performance reasons in the first place.”

 

Denormalization into a multi-value structure does not necessarily introduce redundancy -- why should it? If you start, as practitioners of MV database design would generally do, with a data model in at least 3NF, why should the fact that one table is stored within another introduce redundancy?  The advantage of a MV structure is precisely that is allows you to demoralize [sic] without introducing redundancy.  It does introduce dependencies upon field structure, but the DBMS is designed to take care of that.  Where it then saves on joins is that if I want to retrieve information--for example, in a sales application--on a customer order I can get the order number, customer number and the line item data (e.g. product codes, quantities, prices charged, etc) in one read.  I don't have to store the product names and a whole lot of other product-related information against each line item -- if necessary I can link to them from the product table. [Ed. Note: Do you mean join, by any chance?]

 

The valid criticism of the MV structure is that the flexibility that it provides means that integrity control generally has to be done at the application level rather than the DBMS level -- however, in my experience this is easily managed.

 

2.      “I do not know how long Lambert has been in this business, maybe not long enough to know about the history of VanArsdale’s [as well as other past "solutions", see Those Who Forget or Don’t Know the Past, Are Doomed Repeat It.] Multi-value technology was discarded in the past precisely because it was less usable than relational technology (I explain in my books why).”

 

Discarded?  Well, there is a fair swag of applications still out there in the real world running happily on Pick/Universe/Unidata/etc., and the products are still sold and supported.  Less usable?  Some of us who have worked, as I have in a 22-year career, with both types of product would argue strongly that the ability to use a multi-value structure where appropriate allows us to provide a much more efficient solution.  You might also want to look -- as I have had occasion to do -- at the size of box required to run a Universe application supporting a medium-sized company compared to the size of box required to run an Oracle-based application to do the same job -- if relational technology is so wonderfully efficient, why does it need about ten times as much disc storage and machine power?

 

Fabian Pascal Responds:  What 1NF really means is somewhat tricky and there is considerable confusion about it. We are working on a paper on the subject, which will probably be the first in our DATABASE FOUNDATIONS SERIES, intended to produce statements and clarification on the state of database science, and its practical implications. For the purposes of this discussion:

 

My article is about redundancy in R-tables (formally, relvars) that (a) are in 1NF (or normalized) by definition, but (b) are not further normalized to the fullest. The so-called MVDBMSs do not treat their data as R-tables and, therefore, they are outside the scope of my article. Furthermore, in my first and second books I explained why a MV structure complicates data management without providing any benefits and, in fact, you provide evidence of this very problem, without realizing it: That “integrity control generally has to be done at the application level rather than the DBMS level” is a direct consequence of the (demoralizing, indeed) complexity that the MV structure forces on integrity control and manipulation, which makes is very difficult to implement this in the DBMS (that is, in fact, true for all structures with a hierarchic bent, MV and XML included; in this sense, such products are not really DBMSs.)

 

MVDBMS proponents are oblivious to the complications because almost always they think and talk about structure, and ignore its raison d’etre, integrity and manipulation, which is where the complications occur.

 

Technologies and products, particularly complex ones that require a great deal of investment never die completely. Why, we still have IMS, IDMS and Focus databases/DBMSs out there, would you consider those “not discarded”, or up to date? There are always practitioners and institutions too vested in them and unable/unwilling to learn/appreciate/digest the fundamentals of the field and new developments. Your arguments are an excellent example. But that does not negate technological obsolescence. By “discarded” I did not mean to say they are not used at all; rather, I meant that it is generally known and understood why they are not the best way to do data management, and are just leftovers from the past.

 

The whole point of my article to which VanArsdale and Lambert responded, was to help practitioners avoid falling into the logical/physical--model/implementation confusion trap, rampant in the industry. But because MVDBMS technology is itself predicated on this confusion, it instills it in its proponents to the degree that they do not overcome it even when it is made obvious to them.

 

If a DBMS is inefficient, or consumes large amounts of disk space, that has nothing to do with the data model employed for logical data representation, and everything to do with the product’s physical implementation. But I will go even further and argue that even if the relational model did require more storage space and computing power, it would still be preferable to nonrelational products, because it optimizes the human resource, which is much costlier than hardware.

 

Posted 02/28/03

 

 

 

[ABOUT] [QUOTES] [LINKS]