ON RELATIONAL DIVISION
with Chris Date

 

 

 

From: AT

To:  Editor

Date: 09/08/2003

 

I am currently trying to figure out how the DIVISION operator in relational algebra works in comparison to the universal quantifier of calculus. You say in INTRODUCTION TO DATABASE SYSTEMS (2003) that division never became the equivalent to universal quantification as Codd hoped, due to some problems. Could you explain the actual deficiency of the division operator? You say certain problems with empty relations and such, but I fail to see if the division can solve a complex universal query such as "Find all employees that work on all projects of his/her own department" for example.

 

Any help/references to enlighten this area would be appreciated.

 

 

Chris Date Responds: Thanks for your questions regarding relational division.  Hugh Darwen and I have discussed this issue at length in our paper Into the Great Divide that appears in our book RELATIONAL DATABASE WRITINGS 1989-1991. The following lightly edited extract from that paper addresses your first question ("Could you explain the actual deficiency of the division operator?").  Please note that the text that follows refers specifically to the division operator as originally defined by Codd, not the various extended versions of that operator defined subsequently by Darwen and myself and other writers. 

 

Given the usual suppliers-and-parts database--

 

S  {s#,...}

SP {s#,p#,...}

P  {p#,...,color,...}

 

--the expression

 

sp{s#,p#} DIVIDEBY p{p#}

 

gives supplier numbers S# such that the pair of values {S#,P#} appears in SP for all part numbers P# appearing in P.  This result is usually characterized, informally, as "supplier numbers for suppliers who supply all parts."  And indeed, this characterization is accurate so long as there does exist at least one part.  The case where there are no parts at all requires special consideration, however, and we will return to that case below.  First, however, we give a definition of DIVIDEBY: 

 

Ø       Let relations A and B have headings {X,Y} and {Y}, respectively.  (X and Y here represent sets of zero or more attributes; Y is all of the attributes of B, and X is all of the attributes of A not included in B.  Note therefore that the heading of B must be a subset of the heading of A.)  Then the result of A DIVIDEBY B is a relation with heading {X} and body consisting of all tuples {X x} such that a tuple {X x,Y y} appears in A for all tuples {Y y} appearing in B

 

    {{X x} : FORALL b EXISTS a ( x = a.X AND a.Y = b.Y )}

 

(where a and b are range variables that range over A and B, respectively).

 

Equivalently, A DIVIDEBY B is defined to be shorthand for: 

 

A{X} MINUS ((A{X} TIMES B) MINUS A){X}

 

Now let us revisit the "supplier numbers for suppliers who supply all parts" example.  We have already suggested that a problem arises with this query if there are no parts at all.  To suppose there are no parts at all is intuitively unattractive, however; in particular, it implies not only that P will be empty, but that SP will be empty as well, because attribute SP.P# is a foreign key that references P.  Let us therefore modify the example slightly, as follows.  Let PP consist of P restricted to just those tuples for parts that are purple--

 

p WHERE color = 'Purple'

 

--and consider the expression ("Expression 1"): 

 

sp{s#,p#} DIVIDEBY pp{p#}

 

(informally, "supplier numbers for suppliers who supply all purple parts"). 

 

Now, it might quite reasonably be the case that PP is empty (meaning there are no purple parts), while SP is nonempty.  And if there are no purple parts, then it is true of every supplier that that supplier supplies every purple part.* However, Expression 1 does not give supplier numbers for "every supplier" if PP is empty.  Instead, it gives supplier numbers only for suppliers who supply at least one part--suppliers, that is, who are represented in SP; it does not include any suppliers who happen to supply no parts at all (such suppliers will be represented in S but not in SP).  More generally, the expression A DIVIDEBY B reduces to just A{X} if B is empty (this fact is immediate from the algebraic definition). 

 

* True fact!  I'm not going to justify this position here; you can consult any book on logic for further discussion.  Alternatively, see my own book AN INTRODUCTION TO DATABASE SYSTEMS,8th edition (just published by Addison-Wesley), page 220. 

 

So we see that Expression 1 is not quite sufficient, in general, to represent the query "supplier numbers for suppliers who supply all purple parts."  Or, looking at it another way, a more accurate English translation of that expression is "supplier numbers for suppliers who supply at least one part and supply all purple parts." 

 

You also ask whether division can be used for the query "Find all employees that work on all projects of his/her own department."  Well, yes, I think so, modulo the previous discussion.  Here's the database: 

 

ED{E#,D#}  /* employee E# works in department D# */

DJ{D#,J#}  /* department D# has project J#       */

EJ{E#,J#}  /* employee E# works on project J#    */

 

Assuming ED, DJ, and EJ are all nonempty, I think the following does the trick: 

 

WITH (ej RENAME e# AS x) AS t1,

     (dj RENAME d# AS y) AS t2:

(ed JOIN ((t1 WHERE x = e#){j#}

         DIVIDEBY

         (t2 WHERE y = d#){j#})){e#}

 

This solution depends on the fact that the heading of the divisor doesn't have to be a proper subset of that of the dividend.  I don't propose to elaborate on this issue here, nor do I wish to consider what happens if any of ED, DJ, and EJ are empty, because I think the following solution--involving relational comparisons--is much easier to understand anyway, as well as being guaranteed to work even if any of the relations involved are empty: 

 

WITH (ej RENAME e# AS x) AS t1,

     (dj RENAME d# AS y) AS t2:

(ed JOIN ((t1 WHERE x = e#){j#}

         CONTAINS

         (t2 WHERE y = d#){j#}){e#}

 

For (much) more discussion of relational division, I refer you to the paper already mentioned.  For further discussion of relational comparisons, see the book already mentioned.

 

 

Posted 11/07/03

 

 

 

[ABOUT] [QUOTES] [LINKS]