A XML:DB.ORG EXCHANGE: REPLY TO BRADFORD
by Fabian Pascal

 

 

 

In No Database Champion I took to task Mike Champion of Software AG for his misinterpretation, in an exchange at XML:DB.ORG, of my writings, and for his poor understanding of the relational model and of the problematics of XML as a data management technology. The response he received from his interlocutors in the exchange was even worse, so I am debunking it here. Yet another example of what happens when opinions are expressed without proper knowledge of fundamentals. Here is the first reply, by Tom Bradford (I distinguish his arguments by a different font).

 

“The old guard, meaning the people who have poured their heart and soul into relational databases, are worried, and justifiably so.  Many of the alternatives to relational databases, most notably hierarchical and object databases, never gained enough momentum to present a serious threat to relational databases.  In the past, interchange always had to be done by translating data to and from relational models [sic], because like hierarchical and object databases, the data itself cannot usually stand on its own outside of the data store, especially when the model is highly normalized.”

XML databases are very different.  Beyond simply managing the XML data, XML databases allow the data itself to be useful for interchange outside of the datastore without needing to be transformed or processed beyond the textual representation that most systems would expect.  This is a capability that few database technologies actually possess (save for search engines, if you want to consider those databases).”

 

Note, first, that Bradford does not give any thought to why “hierarchic and object databases never gained enough momentum to present a serious threat to SQL (not relational!) databases”. Had he bothered to check, he would have found that decades ago hierarchic databases almost ground to a halt due to their complexity, inflexibility, and ad-hoc nature, and that object databases were going to bring all that back. Proponents of the hierarchic model almost always focus on structure—trees—and ignore the raison d’etre for the structure, integrity and manipulation, and for good reason: it is in the latter that the complications crop up. SQL DBMSs managed to displace hierarchic databases and were not displaced by object databases (as we were assured would happen) because despite their flaws, they are much simpler and easier to use. And SQL DBMSs do not even come close to a truly relational DBMS (TRDBMS), which would win “with its hands tied behind its back”, so to speak.

 

I am willing to bet that Bradford subscribes to the efficient markets ideology and would defend it with fervor. But, as is so often the case with believers, they do so only when it is convenient. The SQL win over hierarchic databases is hardly convenient, because if you believe in market efficiency, you must conclude that the latter were proven inferior and, therefore, a reintroduction in the market, albeit relabeled as XML, would not be defensible. So you ignore the reasons for the win.

 

The statement that “the data itself cannot usually stand on its own outside of the data store” reveals a major problem in the industry: inability to grasp levels of abstraction. One manifestation is the all too common logical-physical confusion: practitioners fail to distinguish models from implementation. In that vein, Bradford does not understand data fundamentals. The term ‘data’ implies that there is some meaning to it, that is, it is organized logically (structured) in some way; data that is not structured is random and, thus, does not carry any informational value, so it’s not really data. There are different ways of organizing data: tables, spreadsheets, lists, arrays, graphics, audio, binary, and yes, text. Text is not “unstructured data” because it is certainly not random (and not “semi-structured” either, whatever that means). It is data organized as letters, words, sentences, paragraphs, sections, etc. And there are operations (manipulation) that can be applied to textual organization: key searches, concatenation, etc.

The principle of data independence says that how data is represented logically should not be dependent on how it is stored physically. And for good reason: we want the storage to be efficient and we want to be able to change it at will for performance purposes, without affecting the user’s views of the data (indeed, it is when systems fail to separate the logical from the physical well, like SQL DBMSs do, that they cause problems). And, in fact, no DBMS stores data in the same way in which it represents it logically to the user. A RDBMS does not store data as tables, nor does a text system store data as text. It is, therefore, incorrect to say that there is “transformation from/to relational” going on during data exchange. To the extent that there is transformation, it is between different physical formats, which has nothing to do with the data model employed.

 

Now, there is nothing to prevent any DBMS, including relational ones, from storing data in XML format, thereby avoiding translations. The problem is that, for reasons that should have been obvious, XML is a very poor format for either storage or exchange--and unnecessarily so (see The Data Exchange Tail

which is why, predictably, it is increasingly being violated (see The Horror of XML.) So much for yet another “industry standard.”

 

“As we move further into the global economy, and interchange becomes all the more important, reducing the barriers to getting data from one party to another becomes critical.  We're already seeing that a key part of this interchange will be XML for representation, and so it only makes sense that an XML database will be the storage system of choice.  I'm very sure that this scares the holy hell out of the relational people.”

 

Here’s precisely where Bradford follows the industry lead and makes the big mistake that I call The Exchange Tail and the Management Dog (the title of my new seminar in which I discuss this problem): even if XML were a good format for storage and exchange (which it is not), it does not automatically follow that it is a good way to represent data logically for management. That is because insofar as logical representation is concerned, the structure underlying XML is the very same hierarchic structure we got rid of decades ago. As I stated elsewhere, Those Who Forget or Don’t Know the Past, Are Doomed to Repeat It.

 

“But here's the thing. XML databases *won't* replace relational databases because the problems that XML databases attempt to solve, while overlapping in many ways, are very different than the problems that relational databases attempt to solve.  The two will always be needed, and the one somebody chooses will depend on the problem they need to solve.  I'm more willing to bet that the capabilities of XML database will be absorbed by relational databases, creating a very flexible hybrid of the two.  Relational vendors haven't really figured out the best way to do XML, considering their historical investment in relational technology, but they will, and you can rest assured that the big players will be gobbling up most of the viable XML database vendors within the next 4 to 5 years.”

 

Several points here, all having to do with lack of foundation knowledge:

 

ØAny information can be represented relationally or hierarchically and there is no information that the former cannot handle. The question is which is more practical in the integrity and manipulation sense. And Occam’s Razor says the simpler relational model is preferable. Anything that the hierarchic model can achieve, the relational model can in a much less complicated, more flexible, and more optimizable way. This is true even for “natural” hierarchies such as organizations or bill-of-materials (see Chapter 7 in my PRACTICAL ISSUES IN DATA MANAGEMENT). That was the whole point of relational technology, and that is why even poor implementations of it—those based on SQL—displaced hierarchic products.

 

ØA “hybrid” notion is a copout, characteristic of those who do not know and understand data fundamentals. You do not take two data models and “mix” them together to create some “hybrid” model because (a) if one is simpler, more flexible, and more optimizable, “merging” makes it less so (b) two models double complexity without adding any benefit:

 

“In fact, it is axiomatic that if we have N different ways to represent information, then we need N different sets of operators. And if N>1, then we have more operators to implement, document, teach, learn, remember and use. But those extra operators add complexity, not power! There is nothing useful that con be done if N>1 that cannot be done if N=1.”

--C. J. Date, AN INTRODUCTION TO DATABASE SYSTEMS, 8 th Ed.

 

ØIt is quite possible, even probable, that SQL (not relational) vendors will incorporate XML in some way into their products. But this does not mean that XML is a good idea. It only means that the industry has jumped on yet another fad bandwagon, for which the price will be paid somewhere down the road. That’s always the case, but the industry, unlike science, doesn’t seem to learn anything from its past mistakes.

 

 

Posted 4/6/03

© Fabian Pascal 2006 All Rights Reserved