ON DOCUMENT- VS. DATA-BASES
with Fabian Pascal

 

 

 

From: Matt Rogish

To: Editor

Date: 17 May 2005

 

I think for my own fun I'll play around with representation of "documents" in the RM. One thing people ask me a lot is "Well if the RM is so good tell me how to implement something like a Word document in a RDBMS!" It seems complex but not impossible. I usually mention things like "sentence is a set of words", "paragraph is a set of sentences", etc. but it's hard to visualize how a RDBMS would operate based upon a few little examples. I guess the question is how "complex" of relations do you create -- word, sentence, paragraph, document -- or do you let the type system handle something like a "document" type (perhaps using relations described earlier)?

 

With all the hoopla over XML-based document management (I think the OpenOffice product stores all its documents in XML, as does new versions of Microsoft Office) I think there ought to be a Relational answer presented.

 

 

From: Fabian Pascal

To: Matt Rogish

 

I am skeptical. A document merges data and format together, and that is precisely what Codd understood needs to be avoided. Take a look at chapter 1 in my book, it touches on the underlying issues.

 

But that assumes you have an audience that will compare and be able to tell what's better. I'm highly skeptical of that too. OpenOffice will win no matter how worse.

 

 

From: Matt Rogish

 

I re-read Chapter 1 again and, if I'm understanding it correctly, it seems to indicate that the DBMS developer/database designer can either create a bunch of relations or a type—and you favor the relation-level approach since it turns it into a database design problem and not a programming problem?

 

Certainly a "document" is a non-trivial piece of information to model in a DBMS -- but are there theoretical constraints which make it impractical to do so? I mean, provided you did it correctly I would think that you would just have data.

 

 

From: Fabian Pascal

 

Types are things we can talk about. Relations are sets of statements that we can utter about those things. What can you do without the latter? The problem of OO is that they have just types, no relations.

 

A document mixes data with layout (presentation). Databases deal with the former, intentionally leaving the latter to applications. Furthermore, the structure of the document is not such that it lends itself to the kind of inferences that are made from databases. What is the atomicity, selectivity, and correctness for a document base?

 

 

From: Matt Rogish

 

I was thinking primarily of the other benefits a DBMS offers—stuff like concurrency/security control, time-varying data, logical data independence, etc.

 

But, I see your point. What predicates would apply to documents? I can envision how you could (simplistically) *model* a document (documents made up of paragraphs made up of sentences made up of words) but other than that what would you do with it? It's not like you could really apply any rules (aside from grammatical, perhaps) which made much sense. No new facts would need to be derived from the raw data itself (metadata, like author, subject, etc. might be useful, but that's neither here nor there).

 

The only thing I could think of is that it would be nice to relate different bits of documents together -- kind of like hyperlinks but in a DBMS-supplied way. Also, if I'm typing a letter it would be nice to embed DBMS-supplied data into it. If I'm mailing something to Bob Smith, I might want it to pull in his address for me automatically and also update it as it changes. I might gather a chart of sales figures for a presentation -- I'd like the presentation to issue a query to the DBMS so that I get the latest figures (and avoid potential embarrassment).

 

However, none of that really requires that a document be stored in the DBMS itself--just that the presentation layer application have access to it to issue queries. The issues raised in DBMS document storage are non-trivial enough that it probably outweighs the other benefits. It *is* seductive, though. :)

 

 

From: Fabian Pascal

 

Those by themselves do not a DBMS make. The main objective is inferencing (manipulation) and integrity, with presentation intentionally left for apps. You can have a document manager if you want, but that's not a database manager.

 

Good luck to that. The first thing W3C XML committee had to do was to drop the document and substitute an abstraction called 'sequence'. What does the fact that they had to drop their “core reason-for-being object” right at the start, to get anywhere?

 

Nobody stops you from doing that, as long as you don’t tell me to drop RDBMSs in favor of document bases, which is what the industry seems to be promoting.

 

Why do you think Codd stayed away from that? And he was a very smart guy (much beyond the nobodies who are the current pushers of born again fads).

 

 

Posted 7/1/05