From: MB
To: Editor
I'd value your insight as to whether the various subtypes of
relations that logicians give names to (symmetric, transitive, reflexive, etc)
have or should have any special status in the relational model. In a database
these types of relations can of course be implemented as ordinary tables and
appropriate integrity rules.
Take for example the relation r(x,y). If it's symmetric, then
r(x,y) implies r(y,x) and vice versa. In a relational database, a table
corresponding to a reflexive relation must contain 2 rows for each pair
<x,y> that satisfies the relation. This could be enforced by a check that
rejects or fixes (.compensates.) changes that result in unpaired rows. (It
might be kinder to the user to go for, compensates.)
Somebody who confuses a table with its physical
implementation might object because he doesn't like spending two rows' worth of
resources to record one fact. We answer this person by pointing out that the
DBMS is free to choose an efficient physical representation of the table. If
the DBMS is really smart, it shouldn't have to use twice the usual number of
resources.
So, here are some questions:
·
First of all, do you agree with my approach to
representing the reflexive relation?
·
Do we really expect that a well-designed DBMS will
recognize what the implication of an integrity check is for optimizing storage
and operations?
·
Would it be a good thing for the relational model to
have notions like symmetric relation built-in so the user can just declare that
a given relation is symmetric, rather than devising a check herself?
·
Since members of a set are unordered, a symmetric
relation can alternatively be defined as a unary relation r({x,y}), which holds
of a set of two members. The problem can be generalized by considering relation
that holds of a set of some arity n. The 2 approaches (devising a check to
enforce a row for every permutation or tagging the relation as 'symmetric')
discussed so far for the symmetric 2-place relation are not attractive for the
general case of n-place. Yet another approach would be to define a complex
domain for sets and use a table of 1 column containing values taken from this
domain to represent the relation holding of a set. The usual arguments against
complex types discourage this, I think you agree. One is almost tempted to amend the relational model to allow sets
of columns in a table to be designated as unordered with respect to each other
so that each row in the table enumerates a set. Do you see a way out of this?
·
Still yet another approach to sets that also covers
symmetric relations uses a key for each set. In the following table, the key is
set#. Integrity checks can be used to enforce an arity of 2.
· [FIG1.JPG]
· The
problem here is that I'm not directly representing the symmetric relation with
my table.
·
Instead, I'm representing the relation corresponding to
the predicate "set# contains member". What I really want is a
straightforward representation of the predicate, "x an y live
together".
Any remarks, references, or corrections of my possible
misunderstandings will be appreciated. Thanks in any case for reading this far.
Chris Date Responds: Many thanks for your interesting
questions. Herewith a somewhat lengthy response!
First of all, I think--and have thought for many years--that
the properties of symmetry, reflexivity, etc. (as they apply to relations)
probably do have some role to play in the relational model. However, I've never
taken the time to investigate the matter very deeply, and I'm not aware that
anyone else has, either. Thus, I'd like
to encourage you to pursue your own investigations; you might come up with some
original contributions.
That said, I'd like to offer the following further thoughts.
Ø
Most of the logic literature (as opposed to the
database literature) deals with binary relations only. The relational model, of course, deals with
N-ary relations for arbitrary integer N (N = 0); indeed, it was one of Codd's
numerous contributions to show that N-ary relations are interesting in their
own right--i.e. they enjoy interesting formal properties that are not apparent
if they are regarded merely as "nests" or compositions of binary
relations.
Ø
Of course, the properties you mention (symmetry,
reflexivity, etc.) apply specifically to binary relations only. If they are to be useful in a relational
model context, therefore, it seems as if either they need to be generalized
somehow to apply to N-ary relations or we are forced into regarding binary
relations as special in some way. I'm
not too happy about the second of these possibilities.
Ø
Now let's focus on your example, the predicate "X
and Y live together"--which I take to mean, more precisely, "X and Y
live together and nobody else lives with X and Y." As you say, one obvious way to represent
this predicate relationally is by means of a binary relvar* R2{X,Y}, with the
property that if, e.g., the tuple {X x,Y y} appears in R2, then so does the
tuple {X y,Y x} (that's the symmetry property). We can enforce this requirement in a variety of ways--via an
explicit integrity constraint, or via an automated compensating action (implied
by explicit specification of some keyword such as SYMMETRIC on the definition
of R2), or possibly other ways besides.
Note: "Relvar" stands
for relation variable. See either of
the books AN INTRODUCTION TO
DATABASE SYSTEMS or FOUNDATION FOR
FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO for further
explanation.
Ø
But I greatly prefer the alternative design you also
mention: a unary relvar R1 with a single set-valued (or perhaps
relation-valued) attribute A, whose values are sets (or relations) containing exactly
two members, X and Y say. This design
more directly reflects the predicate "X and Y live together" (it
seems to me). By contrast, the R2
design seems to correspond to the less obviously symmetric predicate "X
lives with Y" (and it explicitly requires the symmetric specification in
order to capture the symmetry property and make the DBMS aware of it).
Note: It appears from your email that--unlike many other people!--you
do realize that the R1 design is a legal design. To be specific, it does not violate the requirements of
normalization, or any other requirements of the relational model.
You also suggest another binary
design, involving two attributes SET# and MEMBER. I completely agree with your own criticisms of that design,
especially your observation that this design really represents a different
predicate: "Set SET# contains
member MEMBER." Notice that this
design also needs a complicated integrity constraint to the effect that no
member is included in two or more sets!
Note: An analogous remark
applies to my preferred design, but I believe it's easier to state with that
design:
z FROM
(TUPLE FROM
(SUMMARIZE r1 ADD
SUM (COUNT(a)) AS z)) = COUNT(r2 UNGROUP a)
A particularly telling argument in
favor of the R1 design is that it extends gracefully to the case of N people
living together for arbitrary N (the predicate is "X, Y, ..., Z live
together and nobody else lives with X, Y, ..., Z"). The R2 design does not extend well. The SET#+MEMBER design does extend OK but still
suffers from all of the problems you identify.
Ø
You say:
"One is almost tempted to extend the relational model to allow sets
of columns of tables in a table to be designated as unordered with respect to
each other." I don't think you
really mean what you are saying here; I think rather that what you are getting
at is the possibility that the (sub)tuple value t appearing in a given (sub)set
of attribute positions within some tuple is somehow to be considered as
standing for all possible tuples having exactly the attribute values appearing
in t as values of some permutation of their attributes. But in any case: Amend the relational model???
Certainly not!!! Besides, we
don't need to (see above).
I think I've now answered all of your original questions
except this one: "Do we really expect that a well-designed DBMS will
recognize what the implication of an integrity check is for optimizing storage
and operations?" In general,
perhaps not; but for the special cases under discussion (symmetric and the
rest), I think the answer should be yes.
I hope the foregoing helps.
Please let me know if you have any further thoughts in this area.
(I think you have a typo in your first question: You say "reflexive," but I think
you mean "symmetric.")
Posted
04/18/03
[ABOUT]
[QUOTES]
[LINKS]