FOR ONE REASON OR ANOTHER
by Fabian Pascal

 

 

 


Ed. Note: This article was first published in my DBAzine column, then deleted when the column was abruptly terminated (see What Took Them So Long?)


 

 

An argument that judgment matters but knowledge does not is profoundly anti-intellectual. It implies that we do not need ever to learn anything in order make mature decisions …This sort of thinking is part of what is wrong with this country. We wouldn't call a man in to fix our plumbing who knew nothing about plumbing, but we call pundits to address millions of people on subjects about which they know nothing of substance.

--Juan Cole

 

People who don't know what they are selling--selling to people who are clueless about what they are buying. That is the state of the industry.

--Reader comment

 

 

I’ve written more than once about The Ignorance Mechanism, the mindless fad-driven way in which the IT industry operates. When a while ago I argued in If You Liked SQL, You’ll Love XQuery that SQL authors at IBM never understood data fundamentals and the relational model, I had to write not less than four columns to debunk the utter nonsense in a long exchange at Slashdot.org, in which I was attacked and dismissed for claiming such an “absurdity”. But it is probably because I am seldom proven wrong, particularly about ignorance in the industry, that I don’t win any popularity contests (which tells me that I must be doing something right).

 

Consider IBM Moves the Database Goalposts by Philip Howard of Bloor Research, which was brought to my attention by a reader (see On Recurring Fads).

 

   At its annual analyst conference last week, IBM announced its next generation database. The big news is that this will not be a relational database. Or, to be more accurate, it will not just be a relational database. IBM has concluded, rightly in my view, that using a relational approach is not adequate for processing XML. Either you store it in relational format, in which case you get a major performance hit because you have to convert it to and from tabular format whenever you store or retrieve it, or you have to store it as a binary large object, in which case you can’t do any processing with it.

   So, using relational storage is inadequate for one reason or another, and IBM has concluded that another approach is necessary. The company’s next generation database will therefore have two storage engines: one relational store and one native XML store. And let me be quite clear about this: these engines will be completely separate, with separate tablespaces, separate indexes (Btrees and so forth on the one hand, and hierarchical on the other), and so on.

 

It may be instructive to form an opinion on the Bloor outfit and their qualification to assess technologies and products by reading On Respected Technical Analysts, On Intellectual Laziness, On the So-called “Associative Model of Data”, and Comment on Butler’s Codd Obituary (Butler was once a partner of Bloor’s). Suffice it to say here that at various points in time they jumped on every bandwagon that the industry came up with, e.g. object databases, the so-called “associative model of data”, and now XML. They never met a fad they did not like, so I guess they qualify as “pundits”.

 

As far as we know, XML is for data exchange, not data management. The main authors of XML explicitly say it is syntactic, not semantic in character (although it would be accurate to say that they were somewhat confused in that respect: they also claimed that XML’s tags are supposed the address HTML’s lack of semantics):

 

1.      The only normative definitions of XML, and of Namespaces, operate almost completely at a syntactic level.

2.      I've been in software for 20 years and I've seen lots of interoperable cross-platform syntax and very rarely an interoperable cross-platform data structure or API. Obviously, once you're dealing with some XML inside of a program, you think in terms of the structure. But XML's interoperability is strongly linked to the fact that its definition is syntactic.

--Tim Bray

 

And unlike data exchange, data management is all about semantics (see Tags Do Not A Language Make). Does Howard know what that means?

 

The need for any conversion from XML to relational representation and vice-versa is necessitated only by the industry’s choice of XML as a data exchange format, which is both bad and insufficient for data management (see The Myth of Self-Describing XML), and inefficient for data exchange (see I’ve Glimpsed the Future and It’s XML). And why should The Data Exchange Tail Wag the Data Management Dog anyway?

 

The relational model is not a “storage format”, has nothing to do with storage, and intentionally so (can Howard spell data independence?). Tablespaces, indexes and B-Trees are physical implementation details, while the data model that underlies a database and DBMS, be it relational, SQL (not the same thing!), or hierarchic (XML), is purely logical (as an aside, both SQL and XML have direct-image physical implementations, that is, poor support of data independence). Aside from complexity, one of the major deficiencies of the older DBMSs based on the hierarchic and network data models was that their implementations violated data independence, by unnecessarily exposing physical details to users and applications, causing prohibitive development, maintenance, and re-optimization burdens.

 

It was precisely to avoid the complexity of, and physical contamination in those products that Codd invented the relational model, and formulated his core Information Principle (emphasis added):

 


All information in the database must be cast explicitly in terms of values in relations, and in no other way.


 

Regardless of how IBM’s new DBMS physically stores data, the logical structures—the SQL table and the XML tag hierarchy—are different and, therefore, require different integrity constraints and manipulative operations. In fact, both violate the Information Principle (SQL does it by permitting NULLs, which are not values (see Nulls Nullified, and duplicates), and XML by supporting a nonrelational structure, which causes the very same practical problems that the relational model was invented to eliminate: more complex data language to learn and use, lack of scientific database design guidelines (including when to use one structure vs. the other), and costly application development and maintenance burdens.

 

On the other hand, all the database management stuff, autonomics, the optimiser and so forth, will all be held in common and sit above the two engines. So, there is a database management layer and two database storage engines. This raises the question as to whether you might have more than two storage engines, to which the answer, in principal, is yes. [emphasis added]

 

It is both clear evidence for my claim that the relational model was never understood at IBM, and proof of the sad state of database field that (a) the primary author of SQL, still at IBM, is now pushing XML, and is the author of the XQuery language proposal to W3C and (b) not only is IBM—where Codd invented the relational model—violating his Information Principle (by allowing representations of data other than values in relations), but also that it’s doing so by reintroducing the very hierarchic model that IBM discarded decades back, and which Codd made obsolete. Those Who Forget (or Don’t Know) the Past Are Condemned to Repeat It.

 

Be that as it may, this would not happen if trade media and pundits assessed technology and products based on knowledge, for technical and functional merit, and for soundness, and alerted readers to any regressive or unproductive implications. Yet here we have Howard, a representative specimen, in whose view IBM is “right” because “relational storage” is “inadequate for one reason or another”. Not only does he confuse levels of representation—“relational storage” is an oxymoron—he is also clueless about why relational technology is “inadequate”; instead of evaluating IBM’s technology, he ass-u-mes that it is right just because IBM does it. So much for the “research” in Bloor Research.

 

Ed. Note: Yet it is this article that’s “opinion”! Go figure.

 

If the Howards in the industry relied on a proper foundation framework within which to evaluate technologies, products, and practices, and if they applied it to relational and XML data management, they would know that:

 

·   There is no purpose for which a truly relational (not SQL!) database and DBMS (Howard confuses the two) is inadequate. There is nothing that can be represented (not stored!) hierarchically that cannot be represented relationally (see Tree-Structured Data: A Relational Perspective, and Climbing Trees in SQL in PRACTICAL ISSUES IN DATABASE MANAGEMENT), and relational integrity enforcement and manipulation are much superior to their hierarchic XML counterparts being reinvented now. Even DB2, SQL notwithstanding, is hugely superior to IMS, although it may well end up as bad by the time IBM is finished with it.

 

·   XML “DBMSs” cannot “process” text-, graphic-, and multimedia-structured data any better than RDBMSs can (see Unstructured Thinking, and chapter on data types in PRACTICAL ISSUES IN DATABASE MANAGEMENT, and Un-muddling Modeling).

 

Neither IBM, nor Howard have learned anything from experience.

 

So much for the hard facts; now for some opinion. First, I think this leaves Oracle and Sybase (as the two vendors with the best current handle on XML) well behind the curve, with Microsoft and the others more or less out of sight. What this release will allow you to do is to build applications that handle both XML and relational data much more easily, without losing any of the richness that this implies, and without degrading performance.

 

And here you have it: the fad mechanism at work. Some vendor comes up with something, and instead of assessing it on merit, the media/pundits mindlessly regurgitate vendor press releases without anything to back them up, threatening anybody who does not jump on the bandwagon with “being left behind”. The irony here is that the punditry, in its ignorance, does not even know that Oracle has already bought even more than IBM into the fad, and competition is now on topping each other’s blindness. In such a system, all of us are left behind (see Lenin, Trotsky, and the Freedom from Tyranny of Knowledge and Reason).

 

 

Posted 8/19/05