UPSIDE DOWN AND BACKWARDS 1
by Fabian Pascal

 

 

 

An ITWeek “data exchange analysis” (note my emphasis) by Timothy Dyck, Databases Develop for XML, foretells that "Soon most firms will need systems that can verify, store and retrieve XML data" and asks "Will this require a new type of database, or can relational databases evolve to meet the requirement?” It argues that "As ... XML becomes an increasingly important data interchange format, it makes sense to look at new ways of storing information directly in XML and using XML-based tools for querying and manipulating data", even though "[g]enerally, XML databases are not [yet!] technically strong enough to compete with relational databases". He concludes that:

 

(a) “[XML] tools for this purpose are still developing". Specifically:

·   "XML databases lack numerous administrative, interoperability, programmability and manageability benefits provided by the established relational databases."

·   "Lack of clear standards is also a problem with XML databases ... the XML databases currently available all use proprietary query languages and programming interfaces."

·   "the XPath query syntax has no support for grouping, sorting or summarising data"

·   "the much richer XQuery query language ... still in draft form [when] formulated ... it is unlikely to support updates, inserts or deletions."

 

(b) “[R]elational database vendors are taking advantage of the past work ... and are combining these        strengths with their extensive research efforts into XML parsers and query languages". Specifically:

·   "added object database and Java language features"

·   "added a variety of extensibility features to store geospatial, text, image, HTML and time-series data"

 

The use of the term ‘database’ to mean ‘DBMS’, and ‘relational databases’ to mean ‘SQL DBMSs’ – widespread mistakes in the industry – are not very promising, but let’s consider his arguments.

 

The Data Exchange Tail and the Database Dog

 

Dyck states that “The growing importance of XML as a data interchange format will drive demand for systems that can verify, store and retrieve XML data.” He may be right, but is such driving right?

 

When I criticize XML as a poor rehash of hierarchic database management, I am taken to task by XML proponents. One of them, claiming to have contributed to XML, found my position "pretty wild and miss[ing] the point" because “XML is just a nice, little low-level technique which has some nice properties at the current state of technology for transmitting data ... [it] is just for "small, ephemeral, instantaneous data transfers between processes …” (Setting Some Matters Straight, Parts 1, 2, and 3 in my Against the Grain series). But even he admits that “... clearly some people do want XML for more than just for transmitting data. They do want XML Schemas to be the basic model for database systems. That particular sub-use of XML-related systems is fair target for concerns such as Mr. Pascal's.”

 

Well, my point is that it was to be expected – which is why I predicted it early on – that introducing this “data format” into an industry lacking knowledge of and interest in fundamentals – would inexorably lead to its extension to database management (a purpose for which, purportedly, it was not intended), because that’s what the industry has normally done (e.g. spreadsheets, object orientation and recently browsers). That is why the industry is scrambling now to come up with XML schema and query languages, because there cannot be database management without them and XML did not have any.

 

Note: It is not entirely accurate to say that XML was intended as just a pure data exchange format: the tags are about meaning – a logical aspect, which is not required for physical data exchange. XML was invented by text publishers, not database people, who do not understand this issue (in itself an argument for being circumspect about extending XML to data management).

 

 

The “Still in Infancy” Argument

 

The first of Dyck’s two positions is also common in the industry each time a new fad attracts the attention of the media. That is, in fact, the very position taken when object oriented programming was extended to database management, for which it was not intended either. But it is as untenable an argument for XML as it was for object DBMSs.

 

Database management requires a data model, which conveys the meaning of the data to the DBMS via data types, structure, integrity and manipulation; without such meaning no DBMS can manage data, which must be done by users in applications. Desirable properties of a data model are soundness (that is, a formal scientific foundation), generality (ability to represent as many kinds of information as possible), and simplicity (it should be as simple as possible, but not simpler).

 

XML as initially invented, provided only structure, but no data types, integrity or manipulation, hence the scramble to define them post-hoc. The problem, of course, is that the structure underlying XML is hierarchic. And even if the scientific foundation for the hierarchic data model, graph theory, were adhered to—which admittedly was not—it would still be the case that the model is not general enough (it forces hierarchy on representations of any reality that is not inherently hierarchic), and is horribly complex, which is, in fact, precisely why graph theory is not adhered to. This should be well known by now, as hierarchic DBMSs were used more than 40 years ago (e.g. IBM’s IMS) and discarded for these very reasons. But because of the industry’s disregard for fundamentals, old stuff is constantly being brought back under new labels, without even realizing it.

 

Like object orientation before it -- which does not provide a data model in the above sense at all and was invented in programming not later than the relational model – XML is nothing new that justifies the claim of “infancy”. As users will discover when it is probably too late, it will be very difficult and time consuming to come up with XML specifications functionally equivalent to what the relational model provides, and what will become available will prove as problematic and complex as it did 40 years ago. As I put it in the title of an Against the Grain article, Those Who Don’t Know the Past Are Doomed to Relive It.

 

 

The “If You Can’t Beat Them, Join Them” Argument

 

Dyck’s second argument is as common in the industry as the first: RDBMs vendors will incorporate XML into their systems, thus avoiding the problem of users migrating to “pure” XML DBMSs. This is, of course, an argument similar to that made for hierarchic and network (CODASYL) DBMS vendors when faced with the relational threat (I do still recall vendors adding a  /R to the name of their products). Ultimately, both vendors and users realized that teaching an old nonrelational dog to do relational tricks is a lost cause, but not before they paid a heavy price for it.

 

Except that in the case of XML the reverse is true: in the case of hierarchic DBMSs, the attempt was to give a bad technology better features (which failed); in the case of XML, the attempt is to return worse features to a better technology. The relational model was invented to eliminate the very problems caused by hierarchic DBMSs and Codd made it very clear that if you don’t want those problems, you should not defeat the relational foundation by subverting it and allowing, among other things, hierarchic DBMS features (hence his once famous 12 rules of relational fidelity). Attempts at “incorporating XML capabilities” into current DBMSs are exactly such a subversion, with all the consequences that ensue.

 

Current products are, of course not truly relational, but SQL DBMSs, which is hardly the same thing. And, unfortunately, one of the many deficiencies of SQL and its commercial implementations is that they lack the relational capabilities for handling hierarchies (see Chapter 7 of my PRACTICAL ISSUES IN DATABASE MANAGEMENT). This makes it easier for vendors unfamiliar with fundamentals, to justify such subversions, and for equally uninformed users and the trade media, to accept them as progress. This is equally true for adding “object database and Java language features" (see Object Orientation for Application Development, Not Database Management and To a Hammer Everything Looks Like Nails, Parts 1 and 2, in the Against the Grain series and Oh,Oh, not OO Again).

 

But the solution is not to make SQL products worse, by piling hierarchic DBMS features on top of the SQL flaws. As I said so many times in so many places (see my first editorial), technologyisactually regressing. Rather, the solution are TRDBMSs which, as I have alluded in other writings.

 

 

Posted 7/5/02

© Fabian Pascal 2006 All Rights Reserved