From: Matt Rogish
To: Editor
Date: 17 May 2005
I think for my own fun I'll play around with representation
of "documents" in the RM. One thing people ask me a lot is "Well
if the RM is so good tell me how to implement something like a Word document in
a RDBMS!" It seems complex but not impossible. I usually mention things
like "sentence is a set of words", "paragraph is a set of
sentences", etc. but it's hard to visualize how a RDBMS would operate
based upon a few little examples. I guess the question is how
"complex" of relations do you create -- word, sentence, paragraph,
document -- or do you let the type system handle something like a
"document" type (perhaps using relations described earlier)?
With all the hoopla over XML-based document management (I
think the OpenOffice product stores all its documents in XML, as does new
versions of Microsoft Office) I think there ought to be a Relational answer
presented.
From: Fabian Pascal
To: Matt Rogish
I am skeptical. A document merges data and format together,
and that is precisely what Codd understood needs to be avoided. Take a look at
chapter 1 in my book, it
touches on the underlying issues.
But that assumes you have an audience that will compare and
be able to tell what's better. I'm highly skeptical of that too. OpenOffice
will win no matter how worse.
From: Matt Rogish
I re-read Chapter 1 again and, if I'm understanding it
correctly, it seems to indicate that the DBMS developer/database designer can
either create a bunch of relations or a type—and you favor the relation-level
approach since it turns it into a database design problem and not a programming
problem?
Certainly a "document" is a non-trivial piece of
information to model in a DBMS -- but are there theoretical constraints which
make it impractical to do so? I mean, provided you did it correctly I would
think that you would just have data.
From: Fabian Pascal
Types are things we can talk about. Relations are sets of
statements that we can utter about those things. What can you do without the
latter? The problem of OO is that they have just types, no relations.
A document mixes data with layout (presentation). Databases
deal with the former, intentionally leaving the latter to applications.
Furthermore, the structure of the document is not such that it lends itself to
the kind of inferences that are made from databases. What is the atomicity,
selectivity, and correctness for a document base?
From: Matt Rogish
I was thinking primarily of the other benefits a DBMS
offers—stuff like concurrency/security control, time-varying data, logical data
independence, etc.
But, I see your point. What predicates would apply to
documents? I can envision how you could (simplistically) *model* a document
(documents made up of paragraphs made up of sentences made up of words) but
other than that what would you do with it? It's not like you could really apply
any rules (aside from grammatical, perhaps) which made much sense. No new facts
would need to be derived from the raw data itself (metadata, like author,
subject, etc. might be useful, but that's neither here nor there).
The only thing I could think of is that it would be nice to
relate different bits of documents together -- kind of like hyperlinks but in a
DBMS-supplied way. Also, if I'm typing a letter it would be nice to embed
DBMS-supplied data into it. If I'm mailing something to Bob Smith, I might want
it to pull in his address for me automatically and also update it as it
changes. I might gather a chart of sales figures for a presentation -- I'd like
the presentation to issue a query to the DBMS so that I get the latest figures
(and avoid potential embarrassment).
However, none of that really requires that a document be
stored in the DBMS itself--just that the presentation layer application have
access to it to issue queries. The issues raised in DBMS document storage are
non-trivial enough that it probably outweighs the other benefits. It *is* seductive,
though. :)
From: Fabian Pascal
Those by themselves do not a DBMS make. The main objective is
inferencing (manipulation) and integrity, with presentation
intentionally left for apps. You can have a document manager if you want, but
that's not a database manager.
Good luck to that. The first thing W3C XML committee had to
do was to drop the document and substitute an abstraction called
'sequence'. What does the fact that they had to drop their “core
reason-for-being object” right at the start, to get anywhere?
Nobody stops you from doing that, as long as you don’t tell
me to drop RDBMSs in favor of document bases, which is what the industry seems
to be promoting.
Why do you think Codd stayed away from that? And he was a
very smart guy (much beyond the nobodies who are the current pushers of born
again fads).
Posted 7/1/05