From: AT
To: Editor
Date: 09/08/2003
I am currently trying to figure out how the DIVISION operator
in relational algebra works in comparison to the universal quantifier of
calculus. You say in INTRODUCTION TO DATABASE SYSTEMS (2003) that
division never became the equivalent to universal quantification as Codd hoped,
due to some problems. Could you explain the actual deficiency of the division
operator? You say certain problems with empty relations and such, but I fail to
see if the division can solve a complex universal query such as "Find all
employees that work on all projects of his/her own department" for
example.
Any help/references to enlighten this area would be
appreciated.
Chris Date Responds: Thanks for your questions
regarding relational division. Hugh
Darwen and I have discussed this issue at length in our paper Into the Great
Divide that appears in our book RELATIONAL DATABASE
WRITINGS 1989-1991. The following lightly edited extract from that
paper addresses your first question ("Could you explain the actual
deficiency of the division operator?").
Please note that the text that follows refers specifically to the
division operator as originally defined by Codd, not the various extended
versions of that operator defined subsequently by Darwen and myself and other
writers.
Given the usual suppliers-and-parts database--
S {s#,...}
SP {s#,p#,...}
P {p#,...,color,...}
--the expression
sp{s#,p#} DIVIDEBY p{p#}
gives supplier numbers S# such that the pair of values
{S#,P#} appears in SP for all part numbers P# appearing in P. This result is usually characterized,
informally, as "supplier numbers for suppliers who supply all
parts." And indeed, this
characterization is accurate so long as there does exist at least one part. The case where there are no parts at all
requires special consideration, however, and we will return to that case
below. First, however, we give a
definition of DIVIDEBY:
Ø Let
relations A and B have headings {X,Y} and {Y},
respectively. (X and Y
here represent sets of zero or more attributes; Y is all of the
attributes of B, and X is all of the attributes of A not
included in B. Note therefore
that the heading of B must be a subset of the heading of A.) Then the result of A DIVIDEBY B
is a relation with heading {X} and body consisting of all tuples {X
x} such that a tuple {X x,Y y} appears in A
for all tuples {Y y} appearing in B:
{{X x} : FORALL b EXISTS a ( x = a.X
AND a.Y = b.Y )}
(where a and b are
range variables that range over A and B, respectively).
Equivalently, A DIVIDEBY B
is defined to be shorthand for:
A{X} MINUS ((A{X} TIMES B)
MINUS A){X}
Now let us revisit the
"supplier numbers for suppliers who supply all parts" example. We have already suggested that a problem
arises with this query if there are no parts at all. To suppose there are no parts at all is intuitively unattractive,
however; in particular, it implies not only that P will be empty, but that SP
will be empty as well, because attribute SP.P# is a foreign key that references
P. Let us therefore modify the example
slightly, as follows. Let PP consist of
P restricted to just those tuples for parts that are purple--
p WHERE color = 'Purple'
--and consider the expression
("Expression 1"):
sp{s#,p#} DIVIDEBY pp{p#}
(informally, "supplier numbers
for suppliers who supply all purple parts").
Now, it might quite reasonably be
the case that PP is empty (meaning there are no purple parts), while SP is
nonempty. And if there are no purple
parts, then it is true of every supplier that that supplier supplies
every purple part.* However, Expression 1 does not give
supplier numbers for "every supplier" if PP is empty. Instead, it gives supplier numbers only
for suppliers who supply at least one part--suppliers, that is, who are
represented in SP; it does not include any suppliers who happen to supply no
parts at all (such suppliers will be represented in S but not in SP). More generally, the expression A
DIVIDEBY B reduces to just A{X} if B is empty (this
fact is immediate from the algebraic definition).
* True fact! I'm not going to justify this position here;
you can consult any book on logic for further discussion. Alternatively, see my own book AN INTRODUCTION TO DATABASE SYSTEMS,8th
edition (just published by Addison-Wesley), page 220.
So we see that Expression 1 is not
quite sufficient, in general, to represent the query "supplier numbers for
suppliers who supply all purple parts."
Or, looking at it another way, a more accurate English translation of
that expression is "supplier numbers for suppliers who supply at least
one part and supply all purple parts."
You also ask whether division can be used for the query
"Find all employees that work on all projects of his/her own
department." Well, yes, I think
so, modulo the previous discussion.
Here's the database:
ED{E#,D#} /* employee E#
works in department D# */
DJ{D#,J#} /* department
D# has project J# */
EJ{E#,J#} /* employee E#
works on project J# */
Assuming ED, DJ, and EJ are all nonempty, I think the
following does the trick:
WITH (ej RENAME e# AS x) AS t1,
(dj RENAME d# AS y)
AS t2:
(ed JOIN ((t1 WHERE x = e#){j#}
DIVIDEBY
(t2 WHERE y =
d#){j#})){e#}
This solution depends on the fact that the heading of the
divisor doesn't have to be a proper subset of that of the dividend. I don't propose to elaborate on this issue
here, nor do I wish to consider what happens if any of ED, DJ, and EJ are
empty, because I think the following solution--involving relational
comparisons--is much easier to understand anyway, as well as being
guaranteed to work even if any of the relations involved are empty:
WITH (ej RENAME e# AS x) AS t1,
(dj RENAME d# AS y)
AS t2:
(ed JOIN ((t1 WHERE x = e#){j#}
CONTAINS
(t2 WHERE y =
d#){j#}){e#}
For (much) more discussion of relational division, I refer
you to the paper already mentioned. For
further discussion of relational comparisons, see the book already mentioned.
Posted
11/07/03
[ABOUT]
[QUOTES]
[LINKS]