ON JOIN
with C. J. Date

 

 

 

From: PK

To: Editor

Date:19 Jul 2004

 

According to literature Relational Algebra has five primitive operators: SELECT, PROJECT, UNION, MINUS, and TIMES. They are called primitive since other relational operators can be defined "in terms of" these primitive operators, whilst they themselves cannot.

 

I was wondering what "in terms of" in this context exactly means. Take for example the operator INTERSECT which may be defined as:

 

A INTERSECT B = A MINUS ( ( A MINUS B ) UNION ( B MINUS A ) )

 

INTERSECT is defined in relational algebra, using only the primitive operators MINUS and UNION. I agree to call this a definition "in terms of" the primitive operators.

 

JOIN is not a primitive relational operator. It must therefore be possible to express it "in terms of" primitive relational operators. According to literature JOIN is calculated by a TIMES, then a SELECT and then a PROJECT operation. If you try to express JOIN in relational algebra,you see that is not possible. The relational expression for a JOIN contains many unfilled question marks:

 

A JOIN B = ( ( A TIMES B ) WHERE ? = ? ) { ? }

 

So on the logical level, the JOIN operator can not be primitive. On the implementation level it's of course quite a different story. There you can calculate the common attributes, pass them to the SELECT operators an so forth. But I think that's not the issue.

 

I would kindly hear about your opinion.

 

C. J. Date Responds: As you say, an operator is primitive if it can't be defined in terms of others.  But which operators we choose as primitive in a given context is often arbitrary; for example, we can define TIMES in terms of JOIN or we can define JOIN in terms of TIMES.  More generally, let S be a set of operators.  Then we can say that the set P is a set of primitive operators for S if and only if: 

 

·         P is a subset (not necessarily a proper subset) of S

·         Every operator in S can be defined in terms of operators in P (only). 

 

For example, let S be the set of operators {+,-}, where the operands are (let us assume) signed integers; then we could take either of the sets {+} and {-} as corresponding P's. 

 

Turning now to the relational algebra specifically:  It's true that much of the literature identifies the set consisting of restrict,* project, union, difference, and product as a primitive set, but that primitive set is not unique.  In fact, in our book FOUNDATION FOR FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO, 2nd Ed., Hugh Darwen and I show that every relational algebra operator can be expressed in terms of just two primitive operators.  (And "every relational algebra operator" here includes several operators that aren't even discussed in much of the literature, including GROUP, UNGROUP, PACK, UNPACK, and many others.) 

 

----------

* I GREATLY prefer restrict over select.  Note in particular that SELECT in SQL is closer to project than it is to restrict

----------

 

I hope the foregoing is clear.  I'll close with a few comments on specific points in your original message: 

 

·   Your definition of INTERSECT is correct but unnecessarily complex (since A and B MINUS A are disjoint).  In fact, we have:

 

A INTERSECT B = A MINUS ( A MINUS B )

              = B MINUS ( B MINUS A )

 

·         Of course, INTERSECT is just a special case of JOIN anyway: 

 

A INTERSECT B = A JOIN B

 

(so long as A and B are of the same type, which they must be for INTERSECT). 

 

·         Your "definition" of JOIN--

 

A JOIN B = ( ( A TIMES B ) WHERE ? = ? ) { ? }

 

--if accepted would appear to show that JOIN is a primitive operator, contrary to what you state (because if I understand you aright you're claiming it can't be fully defined in terms of other algebraic operators).  Note in particular that the same argument applies to both restrict and project (both of which are usually regarded as primitive); in both cases, we need additional operands to the operation, over and above those operands that are relations as such.  Note too that, even with UNION, INTERSECT, and MINUS, we have to impose the rule that the operands must be of the same type, so even here we're not talking about the "pure" set-theoretic operators of the same name.

 

 

Posted 10/22/04