From: CS
To: Editor
Date: 9 Dec 2004
In a posting on lambda-the-ultimate.org Dominic Fox discusses
RTF and, more generally, what the issue is with these folks who've dumped
structured data for unstructured XML and are now busy tacking stuff back on to
XML to make it semi-structured:
To be more precise, the translation of the kinds of stuff human
beings think they know, and the kinds of meanings they like to bandy about,
into machine-processable semantic information generally entails a degree of
(re-)formalization. We have not only to discover [the] structure inherent in
the data, but also to derive a representation of that structure that will fit
into our data model; and this is true even if the data model is claimed to
support semistructured data.
The difficulty is then of the following kind: the process of
formalizing semantic information so that it can be processed by an automaton is
not itself automatable (or at least not by the same process that the machine will
use to process the formalized semantic information; there might be some
higher-order process, but the same problem would then apply at the higher
level). The person entering data still has a job to do (apart from just typing
the stuff in), and it is not necessarily an easier job than the job of the
old-fashioned suit-wearing person who performs domain modeling and creates
relational database schemas.
In fact, I would argue that this is the *same job*.
However, I'm seeing perhaps a bit of hope here: despite RDF's
description of expressing connections between bits of information as graphs,
they're on to the idea that you can structure specific types with URI
representations and then use those as keys to refer to specific things in the
real word. For example, uri:isbn:1558608559 would be a key referring to a
specific book.
At this point, one imagines, you could ignore the whole
"pointer" thing, do a web search for pages with content that use a
standardized schema for book information, and find a record with that key that
has a title available, the moral equivalent of
(BOOK_INFO_FROM_THE_WEB WHERE ISBN = 'uri:isbn:1558608559')
{TITLE }
And be back in the land where, though some of this stuff may
be implemented with pointers or their moral equivalent under the hood, as a
user you don't need to worry about it.
Or maybe I'm just dreaming....
Oops. Never mind. My hopes just fell apart here, where you
can find this paragraph:
For one thing, it is important to note that in the conventional
use of reification, the subject of the reification triples is assumed to
identify a particular instance of a triple in a particular RDF document, rather
than some arbitrary triple having the same subject, predicate, and object. This
particular convention is used because reification is intended for expressing
properties such as dates of
composition and source information, as in the examples given already, and these
properties need to be applied to specific instances of triples. There could be
several triples that have the same subject, predicate, and object and, although
a graph is defined as a set of triples, several instances with the same triple
structure might occur in different documents. Thus, to fully support this
convention, there needs to be some means of associating the subject of the
reification triples with an individual triple in some document. However, RDF
provides no way to do this.
It's apparent that at least some people seriously involved
with RDF are convinced that there is always going to be context information of
some sort that is not in the record/tuple/whatever itself that you need to know
to find things. Not to mention some amazing, "and now we back-link to all
of this" stuff that I've seen. ("Great, we've 'navigated' to the
order for this customer. Now, who purchased this order? Uh....")
From: Fabian Pascal
To: CS
I stated this in a much simpler form many times: There is no
such thing as "unstructured data", that's a contradiction in terms;
anything unstructured is random noise and, therefore, carries no
meaning/information and, therefore, is not data. XML is, of course, structured
too, except that it is a regression to the hierarchic structure that we
discarded decades ago because it was not cost-effective (see Those Who Don't Know the Past Are Condemned to Repeat It).
It's only ignorance in the industry, due to lack of education
on fundamentals, that could have come up with XML as a basis for data
management. Those who invented XML were text processing and programming people,
who confused semantics with text formatting (see To a Hammer Everything Looks Like Nails Parts 1, 2).
As evidence that even so-called database experts do not
understand fundamentals, the main author of SQL also bought into the nonsense
of XML databases and authored the XQuery proposal for W3C (see If You Liked SQL, You'll Love
XQuery).
Codd invented the relational model to avoid precisely the
problematics of hierarchic navigation. But in a society/industry where
ignorance of history is considered an asset and knowledge and reason a
liability, there is no learning curve and, therefore, square wheels keep being
reinvented.
From: CS
Funny thing, you know, as I was dealing with a pile of e-mail
address objects in a program the other day, I was doing the usual various
contortions to organize them (hash tables within hash tables and do so on) when
it struck me: we're going exactly backwards with this so-called
"object-relational impedance mismatch." In my OO programs I'm mucking
about with essentially hierarchical data access methods, and all of these damn
OO-Relational frameworks are just turning my nice relational databases into
objects with pointers to each other. Why can't I use relational access methods
in my OO programs? Then something silly like:
all_locations = Array.new
addresses.each { |address|
unless
location_list.include?(address.location)
all_locations
<< address.location
end}
would turn into (just making up syntax off the top of my head
here)
addresses.restrict(:location).distinct
What do you think?
From: Fabian Pascal
And you just realized this? We have been arguing this for
years.
From: CS
Really? Where have you proposed this publicly? I've read your
PRACTICAL ISSUES IN DATABASE
MANAGEMENT and several books by Date, not to mention everything freely
available on your website, and I don't recall this being proposed anywhere.
To be taken seriously, or even to become known at all, this
really needs first of all a reasonably detailed proposal for some language
that's currently in use, and second a rough implementation to demonstrate not
only that this can be done, but that there are grounds to believe that this can
be implemented with reasonable efficiency. Open source that, get a few people
interested, and you may have something.
From: Fabian Pascal
It was not a "formal proposal" and it was earlier
than the site and last book. I am referring to the "impedance
mismatch" nonsense: we debunked it as being backwards: we need to raise
the level of abstraction of programming, not lower that of DBMSs.
From: CS
Ah, ok. You should keep repeating that, so that the new
generation, as they discover your work, keeps that idea in mind.
As well, there's probably an opportunity to promote this by
RDBMS vendors you may know and work with; you could encourage them to develop
alternative APIs that integrate more tightly with the RDBMS than, say, JDBC, as
this might be a selling point for their RDBMS product, as well as getting the
idea out there.
From: Fabian Pascal
Everything we say we repeat over and over again, to no avail.
This is a systemic problem and we cannot change the system.
Nope. Vendors don't care about this, as long as they can make
money based on ignorance. And they sure don't listen to us.
IBM has just announced that next generation of their DBMS
will be XML (see When Ignorance is Expertise, forthcoming at www.dbazine.com. And Codd invented the RM
at IBM. ‘nough said.
Posted 2/18/05