In No Database Champion
I took to task Mike Champion of Software AG for his misinterpretation, in an
exchange at XML:DB.ORG, of my writings, and for his poor understanding of the
relational model and of the problematics of XML as a data management
technology. The response he received from his interlocutors in the exchange was
even worse, so I am debunking it here. Yet another example of what happens when
opinions are expressed without proper knowledge of fundamentals. Here is the
first reply, by Tom Bradford (I distinguish his arguments by a different font).
“The old guard, meaning the people who have poured their heart
and soul into relational databases, are worried, and justifiably so.
Many of the alternatives to relational
databases, most notably hierarchical and object databases, never gained enough
momentum to present a serious threat to relational databases.
In the past, interchange always had to be
done by translating data to and from relational models [sic], because like hierarchical and object
databases, the data itself cannot usually stand on its own outside of the data
store, especially when the model is highly normalized.”
XML databases are very different.
Beyond simply managing the XML data, XML databases allow the data
itself to be useful for interchange outside of the datastore without needing to
be transformed or processed beyond the textual representation that most systems
would expect. This is a capability that
few database technologies actually possess (save for search engines, if you
want to consider those databases).”
Note, first, that Bradford does not give any thought to why
“hierarchic and object databases never gained enough momentum to present a
serious threat to SQL (not relational!) databases”. Had he bothered to check,
he would have found that decades ago hierarchic databases almost ground to a
halt due to their complexity, inflexibility, and ad-hoc nature, and that object
databases were going to bring all that back. Proponents of the hierarchic model
almost always focus on structure—trees—and ignore the raison d’etre for
the structure, integrity and manipulation, and for good reason:
it is in the latter that the complications crop up. SQL DBMSs managed to
displace hierarchic databases and were not displaced by object databases (as we
were assured would happen) because despite their flaws, they are much simpler
and easier to use. And SQL DBMSs do not even come close to a truly relational
DBMS (TRDBMS), which would win “with its hands tied behind its back”, so to
speak.
I am willing to bet that Bradford subscribes to the efficient
markets ideology and would defend it with fervor. But, as is so often the case
with believers, they do so only when it is convenient. The SQL win over
hierarchic databases is hardly convenient, because if you believe in market
efficiency, you must conclude that the latter were proven inferior and,
therefore, a reintroduction in the market, albeit relabeled as XML, would not
be defensible. So you ignore the reasons for the win.
The statement that “the data itself cannot usually stand on
its own outside of the data store” reveals a major problem in the industry:
inability to grasp levels of abstraction. One manifestation is the all too
common logical-physical confusion: practitioners fail to distinguish
models
from implementation. In that vein, Bradford does not understand data
fundamentals. The term ‘data’ implies that there is some meaning to it, that
is, it is organized logically (structured) in some way; data that is not
structured is random and, thus, does not carry any informational value, so it’s
not really data. There are different ways of organizing data: tables,
spreadsheets, lists, arrays, graphics, audio, binary, and yes, text. Text is
not “unstructured data” because it is certainly not random (and not
“semi-structured” either, whatever that means). It is data organized as
letters, words, sentences, paragraphs, sections, etc. And there are operations
(manipulation) that can be applied to textual organization: key searches,
concatenation, etc.
The principle of data independence says that how data
is represented logically should not be dependent on how it is stored
physically. And for good reason: we want the storage to be efficient and we
want to be able to change it at will for performance purposes, without
affecting the user’s views of the data (indeed, it is when systems fail to
separate the logical from the physical well, like SQL DBMSs do, that they cause
problems). And, in fact, no DBMS stores data in the same way in which it
represents it logically to the user. A RDBMS does not store data as tables,
nor does a text system store data as text. It is, therefore, incorrect
to say that there is “transformation from/to relational” going on during data
exchange. To the extent that there is transformation, it is between different
physical formats, which has nothing to do with the data model employed.
Now, there is nothing to prevent any DBMS, including
relational ones, from storing data in XML format, thereby avoiding
translations. The problem is that, for reasons that should have been obvious,
XML is a very poor format for either storage or exchange--and unnecessarily so
(see The
Data Exchange Tail—
which is why, predictably, it is increasingly being violated
(see
The
Horror of XML.) So much for yet another “industry standard.”
“As we move further into the global economy, and interchange
becomes all the more important, reducing the barriers to getting data from one
party to another becomes critical.
We're already seeing that a key part of this interchange will be XML for
representation, and so it only makes sense that an XML database will be the
storage system of choice. I'm very sure
that this scares the holy hell out of the relational people.”
Here’s precisely where Bradford follows the industry lead and
makes the big mistake that I call The Exchange Tail and the Management Dog (the
title of my new seminar in
which I discuss this problem): even if XML were a good format for storage and
exchange (which it is not), it does not automatically follow that it is a good
way to represent data logically for management. That is because insofar as
logical representation is concerned, the structure underlying XML is the very
same hierarchic structure we got rid of decades ago. As I stated elsewhere, Those
Who Forget or Don’t Know the Past, Are Doomed to Repeat It.
“But here's the thing. XML databases *won't* replace relational
databases because the problems that XML databases attempt to solve, while
overlapping in many ways, are very different than the problems that relational
databases attempt to solve. The two
will always be needed, and the one somebody chooses will depend on the problem
they need to solve. I'm more willing to
bet that the capabilities of XML database will be absorbed by relational
databases, creating a very flexible hybrid of the two.
Relational vendors haven't really figured
out the best way to do XML, considering their historical investment in
relational technology, but they will, and you can rest assured that the big
players will be gobbling up most of the viable XML database vendors within the
next 4 to 5 years.”
Several points here, all having to do with lack of foundation
knowledge:
ØAny
information can be represented relationally or hierarchically and there is no
information that the former cannot handle. The question is which is more
practical in the integrity and manipulation sense. And Occam’s Razor says the
simpler relational model is preferable. Anything that the hierarchic model can
achieve, the relational model can in a much less complicated, more flexible,
and more optimizable way. This is true even for “natural” hierarchies such as
organizations or bill-of-materials (see Chapter 7 in my PRACTICAL
ISSUES IN DATA MANAGEMENT). That was the whole point of
relational technology, and that is why even poor implementations of it—those
based on SQL—displaced hierarchic products.
ØA
“hybrid” notion is a copout, characteristic of those who do not know and
understand data fundamentals. You do not take two data models and “mix” them
together to create some “hybrid” model because (a) if one is simpler, more
flexible, and more optimizable, “merging” makes it less so (b) two models
double complexity without adding any benefit:
“In fact, it is axiomatic that if we have N different ways
to represent information, then we need N different sets of operators. And if
N>1, then we have more operators to implement, document, teach, learn,
remember and use. But those extra operators add complexity, not power! There is
nothing useful that con be done if N>1 that cannot be done if N=1.”
--C. J. Date, AN
INTRODUCTION TO DATABASE SYSTEMS, 8
th Ed.
ØIt
is quite possible, even probable, that SQL (not relational) vendors will
incorporate XML in some way into their products. But this does not mean that
XML is a good idea. It only means that the industry has jumped on yet another
fad bandwagon, for which the price will be paid somewhere down the road. That’s
always the case, but the industry, unlike science, doesn’t seem to learn
anything from its past mistakes.
Posted 4/6/03
© Fabian Pascal 2006 All Rights Reserved