A XML:DB.ORG EXCHANGE: REPLY TO STAKEN
by Fabian Pascal

 

 

 

The ongoing process of becoming more and more an a-mathematical society is more an American specialty than anything else (It is also a tragic accident of history). The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as "stifling creativity"; this is more pronounced in the Anglo-Saxon world where a romantic vision of "the humanities" in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design. Industry suffers from the managerial dogma that for the sake of stability and continuity, the company should be independent of the competence of individual employees. Hence industry rejects any methodological proposal that can be viewed as making intellectual demands on its work force. Since in the US the influence of industry is more pervasive than elsewhere, the above dogma hurts American computing science most. The moral of this sad part of the story is that as long as the computing science is not allowed to save the computer industry, we had better see to it that the computer industry does not kill computing science.

--E. Dijkstra, Keynote address, ACM Symposium on Applied Computing, San Antonio, TX, 03/01/1999

 

In A XML:DB.ORG Exchange: Reply to Bradford, I debunked fallacies about, and misinterpretations of my arguments expressed in that exchange. Then, in No Database Champion I debunked some more misconceptions in an article by Michael Champion, another participant in that exchange, which I concluded by quoting myself:

 

"The problem with all these guys is that they do not have any formal education or knowledge. They rely on 'commonsense' and practice and that is simply not enough. That is precisely why the state of IT is so horrendous ... There ought to be a requirement that to be published, one ought to know what he's talking about."

 

 

In this second part I debunk the comments in the exchange to Champion by one Kimbro Staken. I leave it to the reader to judge whether my criticism is on the money or not.

 

Referring to my deploring the current generation of data management professionals’ lack of fundamental knowledge in their field, Staken states:

 

“I just don't get these kids today with long hair and electric guitars they just don't know what art in music is. If they had any formal education they would know that Mozart achieved purity of musical art and there is no point trying anything new, as the rest is just bohemian garbage. Seams something along these lines was said about Charlie Parker, Miles Davis, the Beatles and even Stravinsky. Talk about trying to squash innovation. Seems this may be more of a problem of too much formal education that results in dismissal of ideas before they have time to fully form.”

 

If this is not an excellent validation of my criticism, I don’t know what is.

 

In his obituaryto Ted Codd, who passed away recently, Chris Date reminds us that the inventor of the relational model single-handedly and very much against the grain put the database field on a scientific foundation, the absence of which had inhibited productivity and cost-effectiveness for quite a long time. But Staken is oblivious to the distinction between art and science and this is precisely the kind of educational deficiency I referred to. Staken’s position, even though he does not realize it, is tantamount to criticizing physical theories for stifling creativity in explaining physical reality. But then, what else can be expected from an industry in which not ignorance, but “too much formal education” is considered a problem.

 

Creativity in science takes the form of either amendment to, or new, better theories; there is no equivalent in art. Critics of relational technology, however, want and offer neither! They have yet to explain (a) why logic and math—which is what the relational model is--require any amendment/replacement and (b) what exactly is it that they want to replace them with.

 

“As you tried to point out there is more to the XML equation then just purity of data storage. What I really don't understand is that I have never seen anyone say that XML is a better fundamental model for storing data then relational. As far as I know XML database vendors aren't out there screaming from rooftops that relational is dead and XML is the new king. I think/hope we all realize XML is a niche data management technology. In the unlikely event that it does displace relational it will take a very long time and it won' t occur because of a superior theoretical model.”

 

I don’t know what “purity of data storage” means, but to say it yet again, a data model—relational included—has nothing to do with storage, and intentionally so. And the fact that XML does not separate between the logical and physical level—its hierarchic predecessors did the very same—is one of the reasons practitioners confuse the two, and why XML was created by those very same practitioners in the first place.

On what planet has Staken been living? Is he aware that the exchange he participates in is at a web site called XML:DB? The industry is full of proclamations by the trade press, “experts” and vendors (including the IBMers who implemented the relational model as SQL, who are keen on riding the latest fad too; if you like what they did with SQL, you’ll love what they’ll do with XML) that XML is the new database grail. Ironically, this is claimed even by those eulogizing Ted Codd (see Turning in His Grave?).

 

“There seems to be a lot of this going on recently. People complaining about XML being used for things where other technologies are more appropriate. All of these arguments seem to miss one important point, captivation of people's imagination. XML has it now, SGML/CORBA/ONC RPC/ASN.1/relational do not. People are looking for the next stepping-stone to enhanced applications of technology. XML is providing that right now. Regardless of whether it has everything that is needed in a pure form, it is stimulating developments that were never done before, even though the technologies have existed for years.”

 

Seems like Staken has nothing but industry marketing to captivate his imagination. Note the fuzzy language: “being used for things”, “next stepping stone”, “enhanced application of technology”, “is providing it right now”, “developments never done before”. There is neither substance nor concreteness in any of this.  Like so many, Staken is regurgitating marketing hype and industry terms to impress, without showing any pertinence or validity, let alone usefulness. He is describing (and, obviously, endorses) mindless industry “fad riding” which belies, rather than yields market efficiency—in which he, no doubt, believes.

 

It is precisely because Staken, like so many practitioners lack foundation knowledge of the kind he dismisses, that their “imagination is captivated” by XML, which is the opposite of innovation: not only “was it done before”, but it was also discarded. Its structure is hierarchic, a regression to the bad old days of hierarchic databases and DBMSs (e.g. IMS and Focus), made obsolete decades ago by SQL products, which are not even really relational. In fact, XML was not intended for data management in the first place (see Tags DO Not a Language Make and The Exchange Tail ). It is rather easy to captivate uneducated minds—in that mode vendors can sell anything, and they do.

 

 “The basic simplicity gets people hooked and draws them in, added complexity then becomes an incremental burden instead of an up front study requirement. This is the one reason I feel something like SOAP will succeed where all the CORBAs, DCOMs and binary RPC mechanisms have failed before. By the time you add everything back onto SOAP that you really need it will be just as complex (probably worse) but at that point it doesn't matter. People are already hooked and will pay the price. From an academic perspective this is terrible, unfortunately where network effects are involved it's the only thing that works.”

 

What can I say? Staken admits here what I have been arguing for years: that the industry is moving from fad to fad, forcing practitioners to spend most of their time migrating from one to another, mapping from one to another, trying to integrate them and so on, instead of doing productive work, which Staken considers “academic”. It reminds me of Orwell’s 1984 and DoubleSpeak: terms are used to mean the opposite of their normal meaning.

 

Points arising:

 

Ø“Everything must be as simple as possible, but not simpler” said Einstein. Well, it turns out that types and relations are necessary and sufficient to represent data (see Darwen’s Predicates and Propositions: What a Database Really Is), which makes for the simplest manipulation and integrity. Anything else complicates matters, without adding any functionality or power. But for Staken things “work” only if one does not need to think much upfront—that’s too hard—regardless of the unnecessary complexity that piles up later.

 

ØThe reason XML proponents consider their contraption “simple” is because they normally think only of structure and ignore the main purpose of databases: manipulation and integrity.

 

ØOne of the main reasons hierarchic DBMSs were replaced by SQL DBMSs was the unbearable complexity of the former and the relative simplicity of the latter. The fact that even SQL, which is for from a true, complete and most simple relational implementation, was still preferred to hierarchic DBMSs, says about all.

 

“The technology world is a different place now then when hierarchical died and relational emerged. The glass walled room is dead and network effects rule the day. Technologies that embrace this and keep people from shooting them self in the foot will succeed those that don't will wither away.”

 

More psychobabble belonging, perhaps, in the rubber-walled room.

 

The technology world may be different now, but not because information needs have fundamentally changed so often as to justify the various fads proliferated by the industry, of which XML is but one: client-server, ODBMS, “universal DBMS”, “multi-dimensional DBMS”, “post-relational” DBMS, “object-relational DBMS”, all were promoted as the “new improved” way to do data management. If so, how come we still need a new technology every few years?

 

That’s because these fads never address fundamental data management needs, which are quite stable. Uneducated practitioners like Staken buy into fads even though they can’t even explain what those are and do.

 

“My personal opinion on this is that we need to avoid the comparison. I know it's difficult, often impossible but getting drawn into it is a recipe for failure. XML just doesn't have the mathematical model that relational theory is based on. It's about practical application which people like Mr. Pascal just don't care about. I'm hoping this changes over the next few years as more research is applied but right now I don't think our argument is particularly strong.”

 

Can Staken specify what exactly is not practical about database management being based on logic, which guarantees correctness, and how ignoring logic is practical? Or how set theory, which makes DBMS performance system- rather than user-optimizable, is not practical? Can he explain why reinventing the whole hierarchic database wheel—discarded decades ago—is more practical than a true implementation of the real solution, which the industry never did? Until he does—and he can’t—he is grinding water.

It’s Staken’s educational gap that causes him to distinguish between theory and the practical. "The gap between theory and practice is not as wide in theory as it is in practice" said somebody wise. The theory is there for practical reasons: it guarantees correctness and maximum simplicity. Would Staken say that bridge builders should ignore physical theories for practical reasons?

 

“I see native XML databases as enablers for new application types. I want to see them used in places where databases typically weren't used before. Going head to head against relational technology is a pretty foolish thing for any vendor to do. There are all kinds of applications around where the capabilities of an XML database are very desirable but I don't think enterprise data management is one of them (at least not yet). We want to take things that were never considered data because of the complexity of their structure and turn them into data. Put the database onto the network, enable new peer-to-peer style applications, enable mobile applications. What exactly the applications are isn't for me to determine but with all the XML flying around the need is there.”

 

Sigh. More mindless babble. It is difficult to say anything meaningful about relational and XML technologies when you have no idea what either of them is and does.

 

“And relational databases make you care about the table that the data is stored in. If you change your table structure your queries are just as likely to break. Adding elements shouldn't cause a problem, changing nesting will cause a problem but I don't consider that any different then if you were to rename or further normalize a table. Of course if your XPath looks something like /path/data[2] then you do have a problem. You also have facilities in XPath that do allow queries regardless of the structure it's all in how you use the tools. Maybe that is the problem, XPath will give you the power to shoot your self in the foot if you want...”

 

What on earth is this about? How does one respond to such? One doesn’t.

 

“I think a schema versioning facility might be an interesting feature too. Kind of like views, but at a lower level. This is also where the increasing use of tools like XSL-T or its fraternal twin XQuery will be more important then direct DOM manipulation.”

 

Schema is “kind of views”? “Lower level”? If I were not an atheist “Help us Lord” would be warranted.

 

“Obviously in the XML world there is a strong tendency toward linking to solve these kinds of problems. But if you combine the denormalization in XML with minimal joins and careful use of linking I think you get the best of all worlds and an extremely powerful toolset to optimize performance to best suit your needs. If you normalize your data like you do in the relational world though there is no point to using XML. It's only when you skip 1nf and 2nf and go straight to careful 3nf that you get an XML data model that will take advantage of these benefits. Cross collection joins or linking facilities are essential here.”

 

Say, what? Never mind. Can Staken explain why tags should be included in every record being exchanged that tell the system what the data is, when that must be agreed on prior to any exchange anyway? And how 90% (unnecessary) tags and 10% data yield “performance that best suit your needs”?

 

You only need referential integrity when you actually have references. When modeling XML the only time you need references is to avoid duplication of data. Repeating data and semi-structured data can be handled directly. Now the question is, is the reason we don't have adequate referential integrity because of a flaw in XML or simply because the XML database technology is immature and we haven't addressed this yet? Seems the latter is more likely since joins are not widely implemented yet.”

 

I give up. And rest my case.

 

(Thanks to Paul Vernon for bringing the Dijkstra quote to my attention.)

 

 

Posted 7/6/03

© Fabian Pascal 2006 All Rights Reserved