The ongoing process of becoming
more and more an a-mathematical society is more an American specialty than
anything else (It is also a tragic accident of history). The idea of a formal
design discipline is often rejected on account of vague cultural/philosophical
condemnations such as "stifling creativity"; this is more pronounced
in the Anglo-Saxon world where a romantic vision of "the humanities"
in fact idealizes technical incompetence. Another aspect of that same trait is
the cult of iterative design. Industry suffers from the managerial dogma that
for the sake of stability and continuity, the company should be independent of
the competence of individual employees. Hence industry rejects any methodological
proposal that can be viewed as making intellectual demands on its work force.
Since in the US the influence of industry is more pervasive than elsewhere, the
above dogma hurts American computing science most. The moral of this sad part
of the story is that as long as the computing science is not allowed to save
the computer industry, we had better see to it that the computer industry does
not kill computing science.
--E. Dijkstra, Keynote
address, ACM Symposium on Applied Computing, San Antonio, TX, 03/01/1999
In A XML:DB.ORG Exchange:
Reply to Bradford, I debunked fallacies about, and
misinterpretations of my arguments expressed in that exchange. Then, in No
Database Champion I debunked some more misconceptions in an article by Michael
Champion, another participant in that exchange, which I concluded by quoting
myself:
"The problem with all these guys is that they do not
have any formal education or knowledge. They rely on 'commonsense' and practice
and that is simply not enough. That is precisely why the state of IT is so
horrendous ... There ought to be a requirement that to be published, one ought
to know what he's talking about."
In this second part I debunk the
comments in the exchange to Champion by one Kimbro Staken. I leave it to the
reader to judge whether my criticism is on the money or not.
Referring to my deploring
the current generation of data management professionals’ lack of fundamental
knowledge in their field, Staken states:
“I just don't get these kids today with long hair and electric
guitars they just don't know what art in music is. If they had any formal
education they would know that Mozart achieved purity of musical art and there
is no point trying anything new, as the rest is just bohemian garbage. Seams
something along these lines was said about Charlie Parker, Miles Davis, the
Beatles and even Stravinsky. Talk about trying to squash innovation. Seems this
may be more of a problem of too much formal education that results in dismissal
of ideas before they have time to fully form.”
If this is not an excellent validation of my criticism, I
don’t know what is.
In his
obituaryto Ted Codd, who passed away recently, Chris Date
reminds us that the inventor of the relational model single-handedly and very
much against the grain put the database field on a scientific foundation, the
absence of which had inhibited productivity and cost-effectiveness for quite a
long time. But Staken is oblivious to the distinction between art and science
and this is precisely the kind of educational deficiency I referred to.
Staken’s position, even though he does not realize it, is tantamount to
criticizing physical theories for stifling creativity in explaining physical
reality. But then, what else can be expected from an industry in which not
ignorance, but “too much formal education” is considered a problem.
Creativity in science takes the form of either amendment to,
or new, better theories; there is no equivalent in art. Critics of relational
technology, however, want and offer neither! They have yet to explain (a) why
logic and math—which is what the relational model is--require any
amendment/replacement and (b) what exactly is it that they want to replace them
with.
“As you tried to point out there is more to the XML equation
then just purity of data storage. What I really don't understand is that I have
never seen anyone say that XML is a better fundamental model for storing data
then relational. As far as I know XML database vendors aren't out there
screaming from rooftops that relational is dead and XML is the new king. I
think/hope we all realize XML is a niche data management technology. In the unlikely
event that it does displace relational it will take a very long time and it
won' t occur because of a superior theoretical model.”
I don’t know what “purity of data storage” means, but to say
it yet again, a data model—relational included—has nothing to do with storage,
and intentionally so. And the fact that XML does not separate between the
logical and physical level—its hierarchic predecessors did the very same—is one
of the reasons practitioners confuse the two, and why XML was created by those very
same practitioners in the first place.
On what planet has Staken been living? Is he aware that the
exchange he participates in is at a web site called XML:DB? The industry
is full of proclamations by the trade press, “experts” and vendors (including the
IBMers who implemented the relational model as SQL, who are keen on riding the
latest fad too; if you like what they did with SQL, you’ll love what they’ll do
with XML) that XML is the new database grail. Ironically, this is claimed even
by those eulogizing Ted Codd (see Turning
in His Grave?).
“There seems to be a lot of this going on recently. People
complaining about XML being used for things where other technologies are more
appropriate. All of these arguments seem to miss one important point,
captivation of people's imagination. XML has it now, SGML/CORBA/ONC
RPC/ASN.1/relational do not. People are looking for the next stepping-stone to
enhanced applications of technology. XML is providing that right now.
Regardless of whether it has everything that is needed in a pure form, it is
stimulating developments that were never done before, even though the
technologies have existed for years.”
Seems like Staken has nothing but industry marketing to
captivate his imagination. Note the fuzzy language: “being used for things”,
“next stepping stone”, “enhanced application of technology”, “is providing it
right now”, “developments never done before”. There is neither substance nor
concreteness in any of this. Like so
many, Staken is regurgitating marketing hype and industry terms to impress,
without showing any pertinence or validity, let alone usefulness. He is
describing (and, obviously, endorses) mindless industry “fad riding” which
belies, rather than yields market efficiency—in which he, no doubt, believes.
It is precisely because Staken, like so many practitioners
lack foundation knowledge of the kind he dismisses, that their “imagination is
captivated” by XML, which is the opposite of innovation: not only “was it done
before”, but it was also discarded. Its structure is hierarchic, a regression
to the bad old days of hierarchic databases and DBMSs (e.g. IMS and Focus),
made obsolete decades ago by SQL products, which are not even really
relational. In fact, XML was not intended for data management in the first
place (see Tags DO Not a Language Make
and The Exchange Tail
).
It is rather easy to captivate uneducated
minds—in that mode vendors can sell anything, and they do.
“The basic simplicity
gets people hooked and draws them in, added complexity then becomes an
incremental burden instead of an up front study requirement. This is the one
reason I feel something like SOAP will succeed where all the CORBAs, DCOMs and
binary RPC mechanisms have failed before. By the time you add everything back
onto SOAP that you really need it will be just as complex (probably worse) but
at that point it doesn't matter. People are already hooked and will pay the
price. From an academic perspective this is terrible, unfortunately where
network effects are involved it's the only thing that works.”
What can I say? Staken admits here what I have been arguing
for years: that the industry is moving from fad to fad, forcing practitioners
to spend most of their time migrating from one to another, mapping from one to
another, trying to integrate them and so on, instead of doing productive work,
which Staken considers “academic”. It reminds me of Orwell’s 1984 and
DoubleSpeak: terms are used to mean the opposite of their normal meaning.
Points arising:
ؓEverything
must be as simple as possible, but not simpler” said Einstein. Well, it turns
out that types and relations are necessary and sufficient to represent data
(see Darwen’s Predicates and Propositions: What a Database Really Is), which
makes for the simplest manipulation and integrity. Anything else complicates
matters, without adding any functionality or power. But for Staken things
“work” only if one does not need to think much upfront—that’s too
hard—regardless of the unnecessary complexity that piles up later.
ØThe
reason XML proponents consider their contraption “simple” is because they normally
think only of structure and ignore the main purpose of databases:
manipulation
and integrity.
ØOne
of the main reasons hierarchic DBMSs were replaced by SQL DBMSs was the
unbearable complexity of the former and the relative simplicity of the latter.
The fact that even SQL, which is for from a true, complete and most simple
relational implementation, was still preferred to hierarchic DBMSs, says about
all.
“The technology world is a different place now then when
hierarchical died and relational emerged. The glass walled room is dead and
network effects rule the day. Technologies that embrace this and keep people
from shooting them self in the foot will succeed those that don't will wither
away.”
More psychobabble belonging, perhaps, in the rubber-walled
room.
The technology world may be different now, but not because
information needs have fundamentally changed so often as to justify the various
fads proliferated by the industry, of which XML is but one: client-server,
ODBMS, “universal DBMS”, “multi-dimensional DBMS”, “post-relational” DBMS,
“object-relational DBMS”, all were promoted as the “new improved” way to do
data management. If so, how come we still need a new technology every few
years?
That’s because these fads never address fundamental data
management needs, which are quite stable. Uneducated practitioners like Staken
buy into fads even though they can’t even explain what those are and do.
“My personal opinion on this is that we need to avoid the
comparison. I know it's difficult, often impossible but getting drawn into it
is a recipe for failure. XML just doesn't have the mathematical model that
relational theory is based on. It's about practical application which people
like Mr. Pascal just don't care about. I'm hoping this changes over the next
few years as more research is applied but right now I don't think our argument
is particularly strong.”
Can Staken specify what exactly is not practical about
database management being based on logic, which guarantees correctness, and how
ignoring logic is practical? Or how set theory, which makes DBMS performance
system- rather than user-optimizable, is not practical? Can he explain why
reinventing the whole hierarchic database wheel—discarded decades ago—is more
practical than a true implementation of the real solution, which the industry
never did? Until he does—and he can’t—he is grinding water.
It’s Staken’s educational gap that causes him to distinguish
between theory and the practical. "The gap between theory and practice is
not as wide in theory as it is in practice" said somebody wise. The theory
is there for practical reasons: it guarantees correctness and maximum
simplicity. Would Staken say that bridge builders should ignore physical
theories for practical reasons?
“I see native XML databases as enablers for new application
types. I want to see them used in places where databases typically weren't used
before. Going head to head against relational technology is a pretty foolish
thing for any vendor to do. There are all kinds of applications around where
the capabilities of an XML database are very desirable but I don't think
enterprise data management is one of them (at least not yet). We want to take
things that were never considered data because of the complexity of their
structure and turn them into data. Put the database onto the network, enable
new peer-to-peer style applications, enable mobile applications. What exactly
the applications are isn't for me to determine but with all the XML flying
around the need is there.”
Sigh. More mindless babble. It is difficult to say anything
meaningful about relational and XML technologies when you have no idea what
either of them is and does.
“And relational databases make you care about the table that the
data is stored in. If you change your table structure your queries are just as
likely to break. Adding elements shouldn't cause a problem, changing nesting
will cause a problem but I don't consider that any different then if you were
to rename or further normalize a table. Of course if your XPath looks something
like /path/data[2] then you do have a problem. You also have facilities in
XPath that do allow queries regardless of the structure it's all in how you use
the tools. Maybe that is the problem, XPath will give you the power to shoot your
self in the foot if you want...”
What on earth is this about? How does one respond to such?
One doesn’t.
“I think a schema versioning facility might be an interesting
feature too. Kind of like views, but at a lower level. This is also where the
increasing use of tools like XSL-T or its fraternal twin XQuery will be more
important then direct DOM manipulation.”
Schema is “kind of views”? “Lower level”? If I were not an
atheist “Help us Lord” would be warranted.
“Obviously in the XML world there is a strong tendency toward
linking to solve these kinds of problems. But if you combine the
denormalization in XML with minimal joins and careful use of linking I think
you get the best of all worlds and an extremely powerful toolset to optimize
performance to best suit your needs. If you normalize your data like you do in
the relational world though there is no point to using XML. It's only when you
skip 1nf and 2nf and go straight to careful 3nf that you get an XML data model
that will take advantage of these benefits. Cross collection joins or linking
facilities are essential here.”
Say, what? Never mind. Can Staken explain why tags should be
included in every record being exchanged that tell the system what the
data is, when that must be agreed on prior to any exchange anyway? And how 90%
(unnecessary) tags and 10% data yield “performance that best suit your
needs”?
You only need referential integrity when you actually have
references. When modeling XML the only time you need references is to avoid
duplication of data. Repeating data and semi-structured data can be handled
directly. Now the question is, is the reason we don't have adequate referential
integrity because of a flaw in XML or simply because the XML database
technology is immature and we haven't addressed this yet? Seems the latter is
more likely since joins are not widely implemented yet.”
I give up. And rest my case.
(Thanks to Paul Vernon for bringing the Dijkstra quote to my
attention.)
Posted 7/6/03
© Fabian Pascal 2006 All Rights Reserved