ON VARIOUS TYPES OF RELATIONS
with Chris Date

 

 

 

From: MB

To: Editor

 

I'd value your insight as to whether the various subtypes of relations that logicians give names to (symmetric, transitive, reflexive, etc) have or should have any special status in the relational model. In a database these types of relations can of course be implemented as ordinary tables and appropriate integrity rules.

 

Take for example the relation r(x,y). If it's symmetric, then r(x,y) implies r(y,x) and vice versa. In a relational database, a table corresponding to a reflexive relation must contain 2 rows for each pair <x,y> that satisfies the relation. This could be enforced by a check that rejects or fixes (.compensates.) changes that result in unpaired rows. (It might be kinder to the user to go for, compensates.)

 

Somebody who confuses a table with its physical implementation might object because he doesn't like spending two rows' worth of resources to record one fact. We answer this person by pointing out that the DBMS is free to choose an efficient physical representation of the table. If the DBMS is really smart, it shouldn't have to use twice the usual number of resources.

 

So, here are some questions:

 

·   First of all, do you agree with my approach to representing the reflexive relation?

·   Do we really expect that a well-designed DBMS will recognize what the implication of an integrity check is for optimizing storage and operations?

·   Would it be a good thing for the relational model to have notions like symmetric relation built-in so the user can just declare that a given relation is symmetric, rather than devising a check herself?

·   Since members of a set are unordered, a symmetric relation can alternatively be defined as a unary relation r({x,y}), which holds of a set of two members. The problem can be generalized by considering relation that holds of a set of some arity n. The 2 approaches (devising a check to enforce a row for every permutation or tagging the relation as 'symmetric') discussed so far for the symmetric 2-place relation are not attractive for the general case of n-place. Yet another approach would be to define a complex domain for sets and use a table of 1 column containing values taken from this domain to represent the relation holding of a set. The usual arguments against complex types discourage this, I think you agree.  One is almost tempted to amend the relational model to allow sets of columns in a table to be designated as unordered with respect to each other so that each row in the table enumerates a set. Do you see a way out of this?

·   Still yet another approach to sets that also covers symmetric relations uses a key for each set. In the following table, the key is set#. Integrity checks can be used to enforce an arity of 2.

·   [FIG1.JPG]

·   The problem here is that I'm not directly representing the symmetric relation with my table.

·   Instead, I'm representing the relation corresponding to the predicate "set# contains member". What I really want is a straightforward representation of the predicate, "x an y live together".

 

Any remarks, references, or corrections of my possible misunderstandings will be appreciated. Thanks in any case for reading this far.

 

 

Chris Date Responds: Many thanks for your interesting questions. Herewith a somewhat lengthy response! 

 

First of all, I think--and have thought for many years--that the properties of symmetry, reflexivity, etc. (as they apply to relations) probably do have some role to play in the relational model. However, I've never taken the time to investigate the matter very deeply, and I'm not aware that anyone else has, either.  Thus, I'd like to encourage you to pursue your own investigations; you might come up with some original contributions.

 

That said, I'd like to offer the following further thoughts.

 

Ø       Most of the logic literature (as opposed to the database literature) deals with binary relations only.  The relational model, of course, deals with N-ary relations for arbitrary integer N (N = 0); indeed, it was one of Codd's numerous contributions to show that N-ary relations are interesting in their own right--i.e. they enjoy interesting formal properties that are not apparent if they are regarded merely as "nests" or compositions of binary relations.

 

Ø       Of course, the properties you mention (symmetry, reflexivity, etc.) apply specifically to binary relations only.  If they are to be useful in a relational model context, therefore, it seems as if either they need to be generalized somehow to apply to N-ary relations or we are forced into regarding binary relations as special in some way.  I'm not too happy about the second of these possibilities.

 

Ø       Now let's focus on your example, the predicate "X and Y live together"--which I take to mean, more precisely, "X and Y live together and nobody else lives with X and Y."  As you say, one obvious way to represent this predicate relationally is by means of a binary relvar* R2{X,Y}, with the property that if, e.g., the tuple {X x,Y y} appears in R2, then so does the tuple {X y,Y x} (that's the symmetry property).  We can enforce this requirement in a variety of ways--via an explicit integrity constraint, or via an automated compensating action (implied by explicit specification of some keyword such as SYMMETRIC on the definition of R2), or possibly other ways besides.

 

Note: "Relvar" stands for relation variable.  See either of the books AN INTRODUCTION TO DATABASE SYSTEMS or FOUNDATION FOR FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO for further explanation.

 

Ø       But I greatly prefer the alternative design you also mention: a unary relvar R1 with a single set-valued (or perhaps relation-valued) attribute A, whose values are sets (or relations) containing exactly two members, X and Y say.  This design more directly reflects the predicate "X and Y live together" (it seems to me).  By contrast, the R2 design seems to correspond to the less obviously symmetric predicate "X lives with Y" (and it explicitly requires the symmetric specification in order to capture the symmetry property and make the DBMS aware of it). 

 

Note:  It appears from your email that--unlike many other people!--you do realize that the R1 design is a legal design.  To be specific, it does not violate the requirements of normalization, or any other requirements of the relational model.

 

You also suggest another binary design, involving two attributes SET# and MEMBER.  I completely agree with your own criticisms of that design, especially your observation that this design really represents a different predicate:  "Set SET# contains member MEMBER."  Notice that this design also needs a complicated integrity constraint to the effect that no member is included in two or more sets!  Note:  An analogous remark applies to my preferred design, but I believe it's easier to state with that design:

 

 z FROM

  (TUPLE FROM

        (SUMMARIZE r1 ADD SUM (COUNT(a)) AS z)) = COUNT(r2 UNGROUP a)

 

A particularly telling argument in favor of the R1 design is that it extends gracefully to the case of N people living together for arbitrary N (the predicate is "X, Y, ..., Z live together and nobody else lives with X, Y, ..., Z").  The R2 design does not extend well.  The SET#+MEMBER design does extend OK but still suffers from all of the problems you identify.

 

Ø       You say:  "One is almost tempted to extend the relational model to allow sets of columns of tables in a table to be designated as unordered with respect to each other."  I don't think you really mean what you are saying here; I think rather that what you are getting at is the possibility that the (sub)tuple value t appearing in a given (sub)set of attribute positions within some tuple is somehow to be considered as standing for all possible tuples having exactly the attribute values appearing in t as values of some permutation of their attributes.  But in any case:  Amend the relational model???  Certainly not!!!  Besides, we don't need to (see above). 

 

I think I've now answered all of your original questions except this one: "Do we really expect that a well-designed DBMS will recognize what the implication of an integrity check is for optimizing storage and operations?"  In general, perhaps not; but for the special cases under discussion (symmetric and the rest), I think the answer should be yes.

 

I hope the foregoing helps.  Please let me know if you have any further thoughts in this area.

 

(I think you have a typo in your first question:  You say "reflexive," but I think you mean "symmetric.")

 

Posted 04/18/03

 

 

 

[ABOUT] [QUOTES] [LINKS]