FURTHER THOUGHTS
Further reflection on Gennick's problem made
me realize there was more that could usefully be said on this matter; hence the
present section.
Let's agree to ignore SQL as such until further notice and
switch to relational algebra (and relational terminology) instead. In relational algebra, if r and p
are a relational expression and a predicate (i.e., a truth-valued expression),
respectively, then the expression
r WHERE p
evaluates to a relation with the same heading--i.e., the same
attributes--as r and with a body containing just those tuples of r
for which p evaluates to TRUE.* That result relation is said to be a
restriction of r, and the operation that computes it is called a
restriction operation. Loosely, what
the restriction operation does is filter out certain tuples from its operand
relation.
*Strictly speaking, p must
be quantifier-free and its parameters must be some subset of the attributes of r,
but there's no need to get into the fine detail of such matters here.
Now suppose the expression r itself denotes another
restriction operation, so that the overall expression takes the form, say,
( s WHERE q ) WHERE p
for some relational expression s and some predicate q. Then it's well known that this overall
expression can be legitimately transformed into
s WHERE ( q ) AND ( p )
And this transformation is useful for optimization purposes,
because the original expression implies that the implementation must perform
two passes over the s-relation while the transformed version shows that
one pass suffices. Moreover, the
original expression suggests that the implementation must check q first
when filtering out tuples, but the transformed version shows it can check p
first if it likes, because AND is commutative.
(And it presumably would like to check p first if, for example, p
is easier to evaluate than q, or if it's likely that most tuples satisfy
q but very few satisfy p.)
Suppose now, however, that we have a situation in which the q-filtering
must be done before the p-filtering (as it were). Then the relational expression that
represents this state of affairs is not
( s WHERE q ) WHERE p
--because this expression does not capture the requirement
that q must be applied first, as we have just seen. So what is to be done? Well, in fact a situation of this same
general nature occurs in connection with the issue of type inheritance. Let's look at an example. Note: The discussion that follows is based on one given in the book FOUNDATION FOR FUTURE
DATABASE SYSTEMS: THE THIRD MANIFESTO (2nd edition), by Hugh Darwen and
myself, where further details can be found.
By way of example, then, let CIRCLE be a subtype of type
ELLIPSE, and let relation s have an attribute E that's declared to be of
type ELLIPSE. In general, then, some
tuples of s will have a value for attribute E that is in fact a circle
instead of just a (noncircular) ellipse.
Now consider the following query:
Get tuples of s where the E value is in fact a circle
and the radius of that circle is greater than two.
Observe
immediately that the following "obvious" formulation of the query
doesn't work:
( s WHERE IS_CIRCLE ( E ) )
WHERE THE_R ( E ) >
LENGTH ( 2.0 )
(I'm assuming here that (a) for a given tuple t of s,
IS_CIRCLE(E) returns TRUE if the value of E in t is a circle and FALSE
otherwise, and (b) THE_R is an operator that takes a circle as its operand and
returns the corresponding radius, a value of type LENGTH.) The reason the formulation shown above
doesn't work is because it's logically equivalent to the following:
s WHERE THE_R ( E ) > LENGTH ( 2.0 ) AND IS_CIRCLE ( E
)
And this expression will fail as soon as it tries to invoke
the operator THE_R on a value of E (in some tuple t of s) that's
just an ellipse and not a circle, because THE_R isn't defined for ellipses that
aren't circles; in other words, there's no guarantee that the system will apply
the IS_CIRCLE filtering before attempting to check the radius. Note: In fact, if static type checking is performed, then the
expression will fail at compile time, because the compiler will know
that the argument E to THE_R is of type ELLIPSE but the corresponding parameter
is of type CIRCLE.
Clearly, then, what we need to do, for any given tuple t
of s, is make sure the E value is a circle before we check to see
whether the radius of that circle is greater than two (indeed, we mustn't even
perform this latter check if the E value isn't a circle). To that end, Hugh Darwen and I (in the book
mentioned previously) invented some new syntax, such that the query under
discussion can be correctly formulated as:
( s : IS_CIRCLE ( E ) ) WHERE THE_R ( E ) > LENGTH (
2.0 )
The expression s:IS_CIRCLE(E) is defined to return a
relation r that's identical to s except that:
·
The body of r contains just those tuples of s
for which the E value is a circle.
·
Attribute E in the heading of r is of type
CIRCLE, not type ELLIPSE (so the compile-time type checking on THE_R(E) will
now succeed).
Moreover, the operator precedence rules are defined in such a
way that--even if we don't enclose it in parentheses as above--the expression s:IS_CIRCLE(E)
is guaranteed to be evaluated before the restriction specified in the
WHERE clause is performed. The overall
effect is thus as desired.
Abstracting somewhat from the foregoing example, then, we can
say that the relational expression
( s : q ) WHERE p
is such that the q-filtering will definitely be done
before the p-filtering (even if the parentheses are omitted); that is,
the expression cannot legally be transformed into
( s WHERE p ) : q
or
s WHERE ( p ) AND ( q )
(and not into (s:p):q either, come to
that).
The relevance of the foregoing discussion to Gennick's
example should be obvious, but let me spell the details out. (So now I'm switching back to SQL--but I
want to simplify the original example and abstract from it, somewhat.) Consider the following SQL expression:
SELECT * FROM ( SELECT * FROM s WHERE q ) WHERE p
Rather counter-intuitively (at least, so it seems to me),
then, this expression is not just SQL syntax for
( s WHERE q ) WHERE p
(!)--precisely because, as we saw in the body of this note,
it can't always be legally transformed into the SQL expression
SELECT * FROM s WHERE q AND p
In fact, the SQL standard makes it clear (albeit not in these
terms, of course) that the expression under discussion--
SELECT * FROM ( SELECT * FROM s WHERE q ) WHERE p
--is really SQL syntax for
( s : q ) : p
So the problem with SQL (and I regard this state of affairs
as yet another SQL deficiency) is that it provides syntax for (s:q):p
but not for (s WHERE q) WHERE p. Thus, users are effectively forced to write (s:q):p
even when what they really mean is (s WHERE q) WHERE p. And in Oracle, at least, the optimizer
apparently assumes that users always mean the latter and therefore
performs the logically incorrect transformation--which happens to work, most of
the time, because most of the time users really do mean (s WHERE q)
WHERE p! A somewhat topsy-turvy
situation, you might be forgiven for thinking.
A FINAL OBSERVATION
It's worth noting that SQL does in fact support something
along the lines of the s:IS_T(A) operator described
briefly in the previous section. Here's
the query from that section expressed in SQL ("Get s rows where the
E value is in fact a circle and the radius of that circle is greater than
two"):
SELECT *
FROM ( SELECT A, ..., TREAT E AS CIRCLE AS E, ..., Z
FROM s
WHERE TYPE ( E ) IS OF ( CIRCLE ) )
WHERE THE_R ( E ) >
LENGTH ( 2.0 )
Note: I'm
assuming here that s has attributes A, ..., Z in addition to the
specific attribute E that's of interest to this particular query.
Do note, however, how clumsy the SQL support is! In particular, the operations of (a) picking
out just the tuples of s in which the E value is a circle and (b)
changing the type of attribute E in the result from ELLIPSE to CIRCLE--I'm
speaking pretty loosely here--are expressed separately (in the WHERE clause and
the SELECT clause, respectively). By
contrast, these two operations are bundled together in our syntax s:IS_CIRCLE(E)
and cannot be separated. As a
consequence, there's scope for additional errors in SQL ... For example, the
expression
SELECT *
FROM ( SELECT *
FROM s
WHERE TYPE ( E ) IS OF ( CIRCLE ) )
WHERE THE_R ( E ) >
LENGTH ( 2.0 )
fails on a type error (how, exactly?).
Posted
09/24/04