ON XML DATABASES
with Fabian Pascal

 

 

 

From: GJ

To: Editor

Date: 31 Mar 2005

 

You may be interested in this article, on O'Reilly's xml.com site: Going Native: Making the Case for XML Databases.

 

How did we manage medical records and instruction manuals before XML?

 

Besides the annoying misuse of the term "use case" throughout, the author seems to think the M in XML stands for "modeling."

 

The article states: "A more theoretically correct way to say this is that XML-enabled databases have their own data model — relational, hierarchical, object-oriented — and map instances of the XML data model to instances of their data model. Native XML databases use the XML data model directly." I'm not clear on what theoretical foundation the author is referring to--is there a school of theory regarding text markup?

 

I think it's self-evident that any document that is marked up (structured) with legal XML can be easily stored in a relational database, and that any legal XML "data model" will map directly to a relational model. Logically there just isn't any problem that XML databases can solve that relational systems can't, and it stands to reason that native XML databases simply can't be faster, more scalable, or more reliable than relational systems. How does this idea get any traction?

 

 

From: JG

To: Editor

CC: GJ

 

I finished reading the article over my morning coffee.

 

You know, I write a lot of Microsoft Word documents. Fabian, would you help me find a "Microsoft Word Database?" Then I can store and manage my data using the Word data-model.

 

OK. Sorry. I must be in a really cynical mood this morning.

 

As I read the article, I kept coming back to the thought that the author was thinking in terms of storing and retrieving XML documents. People have done document storage for years. That's why we have features (in, say, Oracle) for things such as full-text-search. It's why we have companies that make document storage-and-retrieval systems. If someone has a lot of XML documents to manage, I can well-understand that they might look for a product to help them.

 

But I cringe at the thought of managing large amounts of data in the form of documents. And I do not like at all assertions such as the author's: "...the data involved does not easily fit the relational data model."

 

The author also brings into the equation such things as: "ease of management, enhanced query performance, concurrent access, transactional safety, security." You get all these things from a good relational database, from a good database period. They are not specific to XML, and have nothing to do with storing XML natively.

 

And what does it mean to store XML "natively?" I can only imagine that it means to store the raw, XML text. But XML database don't do that! Actually, storing XML "natively" doesn't seem to mean "anything". All it seems to imply is a certain amount of ease in storing and manipulating data that you "perceive" as being in XML form, but who knows (and who cares?) what the underlying storage mechanism really is.

 

Chris Date has a fascinating discussion on atomicity in his upcoming book. He has long argued for rich data types. I want to be careful about putting words into his mouth, but I think he would accept the idea of having an XML document column in a table. Why not? If we can store dates, or locations, why not an XML document. But relational database vendors were perhaps not as fast at supporting the easy storage and retrieval of XML documents as they could have been, leading to the development of a market for products specifically targeted at developers who know little more than XML.

 

In the end, I keep thinking that what XML developers want is simply a way to easily store and retrieve their documents. They don't want to think outside the box of XML, either. That latter point is probably what leads to the demand for "XML Databases". But there's a road to hell and damnation here somewhere, I think, if you're not careful, that I can't quite put into words this morning. When one stops thinking in terms of storing atomic, XML documents and begins to talk

of "storing data in XML form", I think one has crossed a line and found that road!

 

 

From: Fabian Pascal

To: JG

 

How can one not be cynical, when the level of ignorance is so absolute?

 

Then they should talk about document bases not databases. Different ballgame.

 

If they never learn what a data model is, and they are not required to know, and the readers are in the same state, why in the world should any one expect anything else?

 

Ask him to define a database, or any one of those terms that he's throwing around. People learn jargon—primarily from vendors or the press—and then regurgitate them without a clue as to what they mean. This impresses the rest of the ignorami that they know something.

 

Doing anything useful with data requires structure, integrity, and manipulation. Problem is XML had initially only structure—and even that is a bad one, thrown away decades ago by Codd. Why do you think they had to come up with Schema and XQuery? Because they could not do anything with a bunch of tags, which can be anything anybody wants.

 

Sure. But for complex data types you would still have to come up with operators and constraints that are agreed on. And that's tough. Read the first chapter in PRACTICAL ISSUES IN DATABASE MANAGEMENT.

 

That's what ignorance does. Read my Lenin, Trotsky, and Freedom from Tyranny of Reason and Knowledge.

 

 

Posted 5/27/05