Ed. Note: This
article was first published in my DBAzine column, then deleted when the column
was abruptly terminated (see What Took Them So Long?)
An argument that judgment matters but knowledge does not is
profoundly anti-intellectual. It implies that we do not need ever to learn
anything in order make mature decisions …This sort of thinking is part of what
is wrong with this country. We wouldn't call a man in to fix our plumbing who
knew nothing about plumbing, but we call pundits to address millions of people
on subjects about which they know nothing of substance.
--Juan
Cole
People who don't know what they are selling--selling to people
who are clueless about what they are buying. That is the state of the industry.
--Reader
comment
I’ve written more than once about The Ignorance Mechanism,
the mindless fad-driven way in which the IT industry operates. When a while ago
I argued in If You Liked
SQL, You’ll Love XQuery that SQL authors at IBM never understood data
fundamentals and the relational model, I had to write not less than four columns to debunk the
utter nonsense in a long exchange at Slashdot.org, in which I was attacked and
dismissed for claiming such an “absurdity”. But it is probably because I am
seldom proven wrong, particularly about ignorance in the industry, that I don’t
win any popularity contests (which tells me that I must be doing something
right).
Consider IBM Moves the Database Goalposts by
Philip Howard of Bloor Research, which was brought to my attention by a reader
(see On Recurring
Fads).
At its annual analyst
conference last week, IBM announced its next generation database. The big news
is that this will not be a relational database. Or, to be more accurate, it
will not just be a relational database. IBM has concluded, rightly in my view,
that using a relational approach is not adequate for processing XML. Either you
store it in relational format, in which case you get a major performance hit
because you have to convert it to and from tabular format whenever you store or
retrieve it, or you have to store it as a binary large object, in which case
you can’t do any processing with it.
So, using relational
storage is inadequate for one reason or another, and IBM has concluded that
another approach is necessary. The company’s next generation database will
therefore have two storage engines: one relational store and one native XML
store. And let me be quite clear about this: these engines will be completely
separate, with separate tablespaces, separate indexes (Btrees and so forth on
the one hand, and hierarchical on the other), and so on.
It may be instructive to form an opinion on the Bloor outfit
and their qualification to assess technologies and products by reading On Respected Technical
Analysts, On
Intellectual Laziness, On the So-called
“Associative Model of Data”, and Comment on Butler’s Codd
Obituary (Butler was once a partner of Bloor’s). Suffice it to say here
that at various points in time they jumped on every bandwagon that the industry
came up with, e.g. object databases, the so-called “associative model of data”,
and now XML. They never met a fad they did not like, so I guess they qualify as
“pundits”.
As far as we know, XML is for data exchange, not data
management. The main authors of XML explicitly say it is syntactic, not
semantic in character (although it would be accurate to say that they were
somewhat confused in that respect: they also claimed that XML’s tags are
supposed the address HTML’s lack of semantics):
1.
The only normative definitions of XML, and of Namespaces,
operate almost completely at a syntactic level.
2.
I've been in software for 20 years and I've seen lots of
interoperable cross-platform syntax and very rarely an interoperable
cross-platform data structure or API. Obviously, once you're dealing with some
XML inside of a program, you think in terms of the structure. But XML's
interoperability is strongly linked to the fact that its definition is
syntactic.
--Tim Bray
And unlike data exchange, data management is all about
semantics (see Tags Do
Not A Language Make). Does Howard know what that means?
The need for any conversion from XML to relational
representation and vice-versa is necessitated only by the industry’s choice of
XML as a data exchange format, which is both bad and insufficient for
data management (see The Myth of
Self-Describing XML), and inefficient for data exchange (see I’ve Glimpsed the Future
and It’s XML). And why should The Data Exchange Tail Wag
the Data Management Dog anyway?
The relational model is not a “storage format”, has nothing
to do with storage, and intentionally so (can Howard spell data
independence?). Tablespaces, indexes and B-Trees are physical
implementation details, while the data model that underlies a database
and DBMS, be it relational, SQL (not the same thing!), or hierarchic (XML), is
purely logical (as an aside, both SQL and XML have direct-image physical
implementations, that is, poor support of data independence). Aside from
complexity, one of the major deficiencies of the older DBMSs based on the
hierarchic and network data models was that their implementations violated data
independence, by unnecessarily exposing physical details to users and
applications, causing prohibitive development, maintenance, and re-optimization
burdens.
It was precisely to avoid the complexity of, and physical
contamination in those products that Codd invented the relational model, and
formulated his core Information Principle (emphasis added):
All information in the database must be cast
explicitly in terms of values in relations, and in no other way.
Regardless of how IBM’s new DBMS physically
stores data, the logical structures—the SQL table and the XML tag
hierarchy—are different and, therefore, require different integrity
constraints and manipulative operations. In fact, both violate the
Information Principle (SQL does it by permitting NULLs, which are not values
(see Nulls
Nullified, and duplicates), and XML by supporting a nonrelational
structure, which causes the very same practical problems that the relational
model was invented to eliminate: more complex data language to learn and use,
lack of scientific database design guidelines (including when to use one
structure vs. the other), and costly application development and maintenance
burdens.
On the other hand, all the database management stuff,
autonomics, the optimiser and so forth, will all be held in common and sit above
the two engines. So, there is a database management layer and two
database storage engines. This raises the question as to whether you might have
more than two storage engines, to which the answer, in principal, is yes. [emphasis added]
It is both clear evidence for my claim that the relational
model was never understood at IBM, and proof of the sad state of database field
that (a) the primary author of SQL, still at IBM, is now pushing XML, and is
the author of the XQuery language proposal to W3C and (b) not only is IBM—where
Codd invented the relational model—violating his Information Principle (by
allowing representations of data other than values in relations), but also that
it’s doing so by reintroducing the very hierarchic model that IBM discarded decades
back, and which Codd made obsolete. Those Who Forget (or Don’t Know) the Past Are Condemned to Repeat It.
Be that as it may, this would not happen if trade media and
pundits assessed technology and products based on knowledge, for technical and
functional merit, and for soundness, and alerted readers to any regressive or
unproductive implications. Yet here we have Howard, a representative specimen,
in whose view IBM is “right” because “relational storage” is “inadequate for
one reason or another”. Not only does he confuse levels of
representation—“relational storage” is an oxymoron—he is also clueless about why
relational technology is “inadequate”; instead of evaluating IBM’s
technology, he ass-u-mes that it is right just because IBM does it. So
much for the “research” in Bloor Research.
Ed. Note: Yet it is
this article that’s “opinion”! Go figure.
If the Howards in the industry relied on a proper foundation framework
within which to evaluate technologies, products, and practices, and if they applied it to relational
and XML data management, they would know that:
·
There is no purpose for which a truly relational
(not SQL!) database and DBMS (Howard confuses the two) is inadequate. There is
nothing that can be represented (not stored!) hierarchically that cannot
be represented relationally (see Tree-Structured Data: A
Relational Perspective, and Climbing Trees in SQL in PRACTICAL ISSUES IN DATABASE MANAGEMENT),
and relational integrity enforcement and manipulation are much superior to
their hierarchic XML counterparts being reinvented now. Even DB2, SQL
notwithstanding, is hugely superior to IMS, although it may well end up as bad
by the time IBM is finished with it.
·
XML “DBMSs” cannot “process” text-, graphic-, and
multimedia-structured data any better than RDBMSs can (see Unstructured Thinking,
and chapter on data types in PRACTICAL
ISSUES IN DATABASE MANAGEMENT, and Un-muddling Modeling).
Neither IBM, nor Howard have learned anything from experience.
So much for the hard facts; now for some opinion. First, I think
this leaves Oracle and Sybase (as the two vendors with the best current handle
on XML) well behind the curve, with Microsoft and the others more or less out
of sight. What this release will allow you to do is to build applications that
handle both XML and relational data much more easily, without losing any of the
richness that this implies, and without degrading performance.
And here you have it: the fad mechanism at work. Some vendor
comes up with something, and instead of assessing it on merit, the
media/pundits mindlessly regurgitate vendor press releases without anything to
back them up, threatening anybody who does not jump on the bandwagon with
“being left behind”. The irony here is that the punditry, in its ignorance,
does not even know that Oracle has already bought even more than IBM into the
fad, and competition is now on topping each other’s blindness. In such a
system, all of us are left behind (see Lenin, Trotsky, and the
Freedom from Tyranny of Knowledge and Reason).
Posted 8/19/05