ON METADATA, RDF AND RELATIONAL REPRESENTATION
with C. J. Date

 

 

 

From: DF

To: Editor

Date: 27 Feb 2004

 

I am an applications developer, and at present I am considering various ways of working with metadata in scenarios where the kinds of predicates used to describe data resources are likely to change over time, as business needs change. To this end, I have been looking at RDF triples as a flexible way of describing resources. It so happens that a set of RDF triples can be given a fairly straightforward representation in a relational database: each triple is effectively a composite key consisting of a subject identifier, a predicate identifier and an object identifier (in RDF, these identifiers are URIs). The "flexibility" comes from the fact that new kinds of predicates can be added without changing the database schema that describes the structure of the triples themselves.

 

What one loses by taking this approach is the ability to constrain the values of each composite key to meaningful combinations: any valid subject identifier can be combined with an valid predicate identifier and any valid object identifier, regardless of whether or not the combination makes sense. Given such a "subject" as ideas, such "predicates" as color and quality of rest, and such "objects" as colourlessness, greenness and fury, one can assert without fear of contradiction (from the system) that the colour of ideas is colourless, that is also green, and that the sleep of the ideas that are colourless and green is furious.

 

By contrast, proper use of the relational model would require one to specify for any given type of subject precisely which predicates it is meaningful to associate with it, and precisely which values (that is, values from which domain) are acceptable in connection with each predicate. It is not always possible to prevent nonsense from being recorded in a relational database, but it is possible to exclude certain *kinds* of nonsense via a small number of essentially sound and fundamentally straightforward mechanisms.

 

Now it seems to me that the quality of the relational model that is most often complained about by people in my position - that is, applications developers trying to meet changing business needs - is precisely the "inflexibility" of these mechanisms. RDF is appealing because, at the cost of permitting me to make nonsensical assertions, it permits me to make assertions of kinds that were not foreseen by the designer of the schema used by the data management system used by my application. For many developers it appears that one must make a "trade-off" between flexibility and data integrity.

 

From what I have read in your pages, it appears that you do not think that this is a trade-off worth making; and moreover that you believe that if database fundamentals are properly understood, then one can deal with changing requirements in the "applications domain" without ever needing to make such potentially catastrophic sacrifices. Many developers *will* make that trade-off, however, partly because of the commercial and institutional pressures on them to respond to new requirements before the businesses they serve have even had time to think properly about what they are doing, and partly no doubt because they - that is, we - do not properly understand database fundamentals. I doubt that much can be done in the short term about the commercial and institutional factors, but I am interested to know what in your view are the chief misunderstandings about database fundamentals that prevent us from seeing how we can accommodate novelty without abandoning logic.

 

 

C. J. Date Responds:  This approach is the old argument that all relvars should be binary in a different guise! Thus, a cogent counterargument is:  How do you deal with irreducible n-order predicates for n <> 2?

 

Ed. Comment: Yup.

 

 

Posted: 04/30/04