An ITWeek “data exchange analysis” (note my
emphasis) by Timothy Dyck, Databases Develop for XML, foretells that
"Soon most firms will need systems that can verify, store and retrieve XML
data" and asks "Will this require a new type of database, or can
relational databases evolve to meet the requirement?” It argues that "As
... XML becomes an increasingly important data interchange format, it makes
sense to look at new ways of storing information directly in XML and using
XML-based tools for querying and manipulating data", even though
"[g]enerally, XML databases are not [yet!] technically strong enough to
compete with relational databases". He concludes that:
(a) “[XML] tools for this purpose are still developing".
Specifically:
· "XML
databases lack numerous administrative, interoperability, programmability and
manageability benefits provided by the established relational databases."
· "Lack
of clear standards is also a problem with XML databases ... the XML databases
currently available all use proprietary query languages and programming
interfaces."
· "the
XPath query syntax has no support for grouping, sorting or summarising
data"
· "the
much richer XQuery query language ... still in draft form [when] formulated ...
it is unlikely to support updates, inserts or deletions."
(b) “[R]elational database vendors are taking advantage of the
past work ... and are combining these
strengths with their extensive research efforts into XML parsers and
query languages". Specifically:
· "added
object database and Java language features"
· "added
a variety of extensibility features to store geospatial, text, image, HTML and
time-series data"
The use of the term ‘database’ to mean ‘DBMS’, and ‘relational
databases’ to mean ‘SQL DBMSs’ – widespread mistakes in the industry
– are not very promising, but let’s consider his arguments.
The Data Exchange Tail and the Database Dog
Dyck states that “The growing importance of XML as a data
interchange format will drive demand for systems that can verify, store and
retrieve XML data.” He may be right, but is such driving right?
When I criticize XML as a poor rehash of hierarchic database
management, I am taken to task by XML proponents. One of them, claiming to have
contributed to XML, found my position "pretty wild and miss[ing] the
point" because “XML is just a nice, little low-level technique which has
some nice properties at the current state of technology for transmitting data
... [it] is just for "small, ephemeral, instantaneous data transfers
between processes …” (Setting Some Matters Straight, Parts 1, 2, and
3 in my Against the Grain
series). But even he admits that “... clearly some people do want XML for more
than just for transmitting data. They do want XML Schemas to be the basic model
for database systems. That particular sub-use of XML-related systems is fair
target for concerns such as Mr. Pascal's.”
Well, my point is that it was to be expected – which is why I
predicted it early on – that introducing this “data format” into an industry
lacking knowledge of and interest in fundamentals – would inexorably lead to
its extension to database management (a purpose for which, purportedly, it was
not intended), because that’s what the industry has normally done (e.g.
spreadsheets, object orientation and recently browsers). That is why the
industry is scrambling now to come up with XML schema and query languages, because
there cannot be database management without them and XML did not have any.
Note: It is not entirely accurate to say that XML was
intended as just a pure data exchange format: the tags are about meaning
– a logical aspect, which is not required for physical data exchange.
XML was invented by text publishers, not database people, who do not understand
this issue (in itself an argument for being circumspect about extending XML to
data management).
The “Still in Infancy” Argument
The first of Dyck’s two positions is also common in the
industry each time a new fad attracts the attention of the media. That is, in
fact, the very position taken when object oriented programming was extended to
database management, for which it was not intended either. But it is as
untenable an argument for XML as it was for object DBMSs.
Database management requires a data model,
which conveys the meaning of the data to the DBMS via data types,
structure, integrity and manipulation; without such meaning no DBMS can
manage data, which must be done by users in applications. Desirable properties
of a data model are soundness (that is, a formal scientific foundation),
generality (ability to represent as many kinds of information as
possible), and simplicity (it should be as simple as possible, but not
simpler).
XML as initially invented, provided only structure, but no
data types, integrity or manipulation, hence the scramble to define them
post-hoc. The problem, of course, is that the structure underlying XML is
hierarchic.
And even if the scientific foundation for the hierarchic data model, graph
theory, were adhered to—which admittedly was not—it would still be
the case that the model is not general enough (it forces
hierarchy on representations of any reality that is not inherently hierarchic),
and is horribly complex, which is, in fact, precisely why graph theory
is not adhered to. This should be well known by now, as hierarchic DBMSs
were used more than 40 years ago (e.g. IBM’s IMS) and discarded for these very
reasons. But because of the industry’s disregard for fundamentals, old stuff is
constantly being brought back under new labels, without even realizing it.
Like object orientation before it -- which does not provide a
data model in the above sense at all and was invented in programming not later
than the relational model – XML is nothing new that justifies the claim of
“infancy”. As users will discover when it is probably too late, it will be very
difficult and time consuming to come up with XML specifications functionally
equivalent to what the relational model provides, and what will become
available will prove as problematic and complex as it did 40 years ago. As I
put it in the title of an Against the Grain
article, Those Who Don’t Know the Past Are Doomed to Relive It.
The “If You Can’t Beat Them, Join Them” Argument
Dyck’s second argument is as common in the industry as the
first: RDBMs vendors will incorporate XML into their systems, thus avoiding the
problem of users migrating to “pure” XML DBMSs. This is, of course, an argument
similar to that made for hierarchic and network (CODASYL) DBMS vendors when
faced with the relational threat (I do still recall vendors adding a
/R to the name of their products).
Ultimately, both vendors and users realized that teaching an old nonrelational
dog to do relational tricks is a lost cause, but not before they paid a heavy
price for it.
Except that in the case of XML the reverse is true: in the
case of hierarchic DBMSs, the attempt was to give a bad technology better
features (which failed); in the case of XML, the attempt is to return worse
features to a better technology. The relational model was invented to
eliminate the very problems caused by hierarchic DBMSs and Codd made it very
clear that if you don’t want those problems, you should not defeat the
relational foundation by subverting it and allowing, among other things,
hierarchic DBMS features (hence his once famous 12 rules of relational
fidelity). Attempts at “incorporating XML capabilities” into current DBMSs are
exactly such a subversion, with all the consequences that ensue.
Current products are, of course not truly relational, but
SQL
DBMSs, which is hardly the same thing. And, unfortunately, one of the many
deficiencies of SQL and its commercial implementations is that they lack the
relational capabilities for handling hierarchies (see Chapter 7 of my PRACTICAL ISSUES IN DATABASE MANAGEMENT).
This makes it easier for vendors unfamiliar with fundamentals, to justify such
subversions, and for equally uninformed users and the trade media, to accept
them as progress. This is equally true for adding “object database and Java
language features" (see Object Orientation for Application
Development, Not Database Management and To a Hammer Everything
Looks Like Nails, Parts 1 and 2, in the Against the Grain
series and Oh,Oh, not OO Again).
But the solution is not to make SQL products worse, by piling
hierarchic DBMS features on top of the SQL flaws. As I said so many times in so
many places (see my first editorial),
technologyisactually regressing. Rather, the solution
are TRDBMSs which, as I have alluded in other writings.
Posted 7/5/02
© Fabian Pascal 2006 All Rights Reserved