From: DF
To: Editor
Date: 27 Feb 2004
I am an applications developer, and at present I am
considering various ways of working with metadata in scenarios where the kinds
of predicates used to describe data resources are likely to change over time,
as business needs change. To this end, I have been looking at RDF triples as a
flexible way of describing resources. It so happens that a set of RDF triples
can be given a fairly straightforward representation in a relational database:
each triple is effectively a composite key consisting of a subject identifier,
a predicate identifier and an object identifier (in RDF, these identifiers are
URIs). The "flexibility" comes from the fact that new kinds of
predicates can be added without changing the database schema that describes the
structure of the triples themselves.
What one loses by taking this approach is the ability to
constrain the values of each composite key to meaningful combinations: any
valid subject identifier can be combined with an valid predicate identifier and
any valid object identifier, regardless of whether or not the combination makes
sense. Given such a "subject" as ideas, such "predicates"
as color and quality of rest, and such "objects" as colourlessness,
greenness and fury, one can assert without fear of contradiction (from the
system) that the colour of ideas is colourless, that is also green, and that
the sleep of the ideas that are colourless and green is furious.
By contrast, proper use of the relational model would require
one to specify for any given type of subject precisely which predicates it is
meaningful to associate with it, and precisely which values (that is, values
from which domain) are acceptable in connection with each predicate. It is not
always possible to prevent nonsense from being recorded in a relational
database, but it is possible to exclude certain *kinds* of nonsense via a small
number of essentially sound and fundamentally straightforward mechanisms.
Now it seems to me that the quality of the relational model
that is most often complained about by people in my position - that is,
applications developers trying to meet changing business needs - is precisely
the "inflexibility" of these mechanisms. RDF is appealing because, at
the cost of permitting me to make nonsensical assertions, it permits me to make
assertions of kinds that were not foreseen by the designer of the schema used
by the data management system used by my application. For many developers it
appears that one must make a "trade-off" between flexibility and data
integrity.
From what I have read in your pages, it appears that you do
not think that this is a trade-off worth making; and moreover that you believe
that if database fundamentals are properly understood, then one can deal with
changing requirements in the "applications domain" without ever
needing to make such potentially catastrophic sacrifices. Many developers
*will* make that trade-off, however, partly because of the commercial and
institutional pressures on them to respond to new requirements before the
businesses they serve have even had time to think properly about what they are
doing, and partly no doubt because they - that is, we - do not properly
understand database fundamentals. I doubt that much can be done in the short
term about the commercial and institutional factors, but I am interested to
know what in your view are the chief misunderstandings about database
fundamentals that prevent us from seeing how we can accommodate novelty without
abandoning logic.
C. J. Date Responds:
This approach is the old argument that all relvars should be binary in a
different guise! Thus, a cogent counterargument is: How do you deal with irreducible n-order predicates for n
<> 2?
Ed. Comment: Yup.
Posted:
04/30/04