MORE ON XML AND RDF

 

 

 

From: CS

To: Editor

Date: 9 Dec 2004

 

In a posting on lambda-the-ultimate.org Dominic Fox discusses RTF and, more generally, what the issue is with these folks who've dumped structured data for unstructured XML and are now busy tacking stuff back on to XML to make it semi-structured:

 

To be more precise, the translation of the kinds of stuff human beings think they know, and the kinds of meanings they like to bandy about, into machine-processable semantic information generally entails a degree of (re-)formalization. We have not only to discover [the] structure inherent in the data, but also to derive a representation of that structure that will fit into our data model; and this is true even if the data model is claimed to support semistructured data.

 

The difficulty is then of the following kind: the process of formalizing semantic information so that it can be processed by an automaton is not itself automatable (or at least not by the same process that the machine will use to process the formalized semantic information; there might be some higher-order process, but the same problem would then apply at the higher level). The person entering data still has a job to do (apart from just typing the stuff in), and it is not necessarily an easier job than the job of the old-fashioned suit-wearing person who performs domain modeling and creates relational database schemas.

 

In fact, I would argue that this is the *same job*.

 

However, I'm seeing perhaps a bit of hope here: despite RDF's description of expressing connections between bits of information as graphs, they're on to the idea that you can structure specific types with URI representations and then use those as keys to refer to specific things in the real word. For example, uri:isbn:1558608559 would be a key referring to a specific book.

 

At this point, one imagines, you could ignore the whole "pointer" thing, do a web search for pages with content that use a standardized schema for book information, and find a record with that key that has a title available, the moral equivalent of

 

(BOOK_INFO_FROM_THE_WEB WHERE ISBN = 'uri:isbn:1558608559') {TITLE }

 

And be back in the land where, though some of this stuff may be implemented with pointers or their moral equivalent under the hood, as a user you don't need to worry about it.

 

Or maybe I'm just dreaming....

 

Oops. Never mind. My hopes just fell apart here, where you can find this paragraph:

 

For one thing, it is important to note that in the conventional use of reification, the subject of the reification triples is assumed to identify a particular instance of a triple in a particular RDF document, rather than some arbitrary triple having the same subject, predicate, and object. This particular convention is used because reification is intended for expressing properties such as dates of composition and source information, as in the examples given already, and these properties need to be applied to specific instances of triples. There could be several triples that have the same subject, predicate, and object and, although a graph is defined as a set of triples, several instances with the same triple structure might occur in different documents. Thus, to fully support this convention, there needs to be some means of associating the subject of the reification triples with an individual triple in some document. However, RDF provides no way to do this.

 

It's apparent that at least some people seriously involved with RDF are convinced that there is always going to be context information of some sort that is not in the record/tuple/whatever itself that you need to know to find things. Not to mention some amazing, "and now we back-link to all of this" stuff that I've seen. ("Great, we've 'navigated' to the order for this customer. Now, who purchased this order? Uh....")

 

 

From: Fabian Pascal

To: CS

 

I stated this in a much simpler form many times: There is no such thing as "unstructured data", that's a contradiction in terms; anything unstructured is random noise and, therefore, carries no meaning/information and, therefore, is not data. XML is, of course, structured too, except that it is a regression to the hierarchic structure that we discarded decades ago because it was not cost-effective (see Those Who Don't Know the Past Are Condemned to Repeat It).

 

It's only ignorance in the industry, due to lack of education on fundamentals, that could have come up with XML as a basis for data management. Those who invented XML were text processing and programming people, who confused semantics with text formatting (see To a Hammer Everything Looks Like Nails Parts 1, 2).

 

As evidence that even so-called database experts do not understand fundamentals, the main author of SQL also bought into the nonsense of XML databases and authored the XQuery proposal for W3C (see If You Liked SQL, You'll Love XQuery).

 

Codd invented the relational model to avoid precisely the problematics of hierarchic navigation. But in a society/industry where ignorance of history is considered an asset and knowledge and reason a liability, there is no learning curve and, therefore, square wheels keep being reinvented.

 

 

From: CS

 

Funny thing, you know, as I was dealing with a pile of e-mail address objects in a program the other day, I was doing the usual various contortions to organize them (hash tables within hash tables and do so on) when it struck me: we're going exactly backwards with this so-called "object-relational impedance mismatch." In my OO programs I'm mucking about with essentially hierarchical data access methods, and all of these damn OO-Relational frameworks are just turning my nice relational databases into objects with pointers to each other. Why can't I use relational access methods in my OO programs? Then something silly like:

 

all_locations = Array.new

addresses.each { |address|

    unless location_list.include?(address.location)

        all_locations << address.location

    end}

 

would turn into (just making up syntax off the top of my head here)

 

addresses.restrict(:location).distinct

 

What do you think?

 

 

From: Fabian Pascal

 

And you just realized this? We have been arguing this for years.

 

 

From: CS

 

Really? Where have you proposed this publicly? I've read your PRACTICAL ISSUES IN DATABASE MANAGEMENT and several books by Date, not to mention everything freely available on your website, and I don't recall this being proposed anywhere.

 

To be taken seriously, or even to become known at all, this really needs first of all a reasonably detailed proposal for some language that's currently in use, and second a rough implementation to demonstrate not only that this can be done, but that there are grounds to believe that this can be implemented with reasonable efficiency. Open source that, get a few people interested, and you may have something.

 

 

From: Fabian Pascal

 

It was not a "formal proposal" and it was earlier than the site and last book. I am referring to the "impedance mismatch" nonsense: we debunked it as being backwards: we need to raise the level of abstraction of programming, not lower that of DBMSs.

 

 

From: CS

 

Ah, ok. You should keep repeating that, so that the new generation, as they discover your work, keeps that idea in mind.

 

As well, there's probably an opportunity to promote this by RDBMS vendors you may know and work with; you could encourage them to develop alternative APIs that integrate more tightly with the RDBMS than, say, JDBC, as this might be a selling point for their RDBMS product, as well as getting the idea out there.

 

 

From: Fabian Pascal

 

Everything we say we repeat over and over again, to no avail. This is a systemic problem and we cannot change the system.

 

Nope. Vendors don't care about this, as long as they can make money based on ignorance. And they sure don't listen to us.

 

IBM has just announced that next generation of their DBMS will be XML (see When Ignorance is Expertise, forthcoming at www.dbazine.com. And Codd invented the RM at IBM. ‘nough said.

 

 

Posted 2/18/05