From: ES
To: Editor
Date: 15 Feb 2005
A thought on the recent discussions A Cure for Madness and
More On a Cure for Madness. Is there in fact a need for defining an extra
(shorthand) operator?
The proposed ':' operator has at least some vague resemblance
to "optimization hints from the user to the system", because it tells
the system what expressions (e.g. IS_CIRCLE) must be evaluated before which
others (e.g. THE_R).
The thing is, there should not be any need at all for the
user to specify this to the system, as the system can, and should, already be
aware of this, namely by looking at the type hierarchy as it has been defined
to the system by that same (community of) user(s).
If we take the position that it is legal for the system to
assume that the user "knows what he is doing", that means among other
things that the system can legally assume that the user is aware of the precise
nature of the type system. In the
example : the system can legally assume that the
user *knows* that only circles have a radius, and ellipses do
not.
If a user then queries (a relation having) attributes of type
ellipse, and that user specifies a restriction on "radius", it is
therefore legal for the system to derive the assumption that this user is only
interested in circles specifically.
Meaning : the system could apply the IS_CIRCLE restriction
automatically, without the user having to specify it. No need at all to bother the user with this kind of stuff, and no
need to "pollute" the language with additional syntactic sugar such
as this ":" operator. I must
say I am with RI here. An extra
operator means extra complexity in the sense that mastering the language
involves mastering more operators. And
while it is possible for any user to restrict himself to just a basic set of
operators when writing, this is obviously not so when reading what was written
(i.e. programs) by someone else.
(Remark : this is not to say that operators such as IS_CIRCLE
should not be made available to the user.
They are useful and they should be provided. This is only to say that I
do not see the reason why the user should be forced to mention this particular
kind of operator in the kind of situation (restriction) under discussion. The fact that ellipses do not have radii, is
already known in the catalog. The user
should not be forced to duplicate this information in his queries.)
(Second remark : of course I am aware that if a user does not
"know what he is doing" in the sense mentioned earlier, then that
user might be confronted with unexpected and hard-to-explain results. But is
that the DBMS's fault ?)
e.g.
TYPE PLANEFIGURE
...
TYPE ELLIPSE IS
PLANEFIGURE POSSREP (longaxis LENGTH shortaxis LENGTH center POINT)
TYPE CIRCLE IS
ELLIPSE POSSREP (radius LENGTH center POINT)
TYPE UNITY_CIRCLE
IS CIRCLE POSSREP (center POINT)
TYPE RECTANGLE IS
PLANEFIGURE POSSREP (base LENGTH height LENGTH center POINT)
TYPE SQUARE IS
RECTANGLE POSSREP (side LENGTH center POINT)
RELATION X (round :
ELLIPSE cornered : RECTANGLE)
RELATION Y (figure
: PLANEFIGURE)
X WHERE THE_RADIUS(round) > 0.75
is translated automatically to:
X WHERE (IS_CIRCLE(round) AND THE_RADIUS(round) > 0.75)
and returns only tuples with a circle value (i.e. just
circles, or the more specialised unity_circles) in 'round' for which the radius
exceeds 0.75. If I understood chapter 20 (of the introduction book) correctly,
then this should not be a problem, since the THE_RADIUS operator really does
exist for UNITY_CIRCLES as well, it is only its usage in *assignment*
operations that is ruled out.
X WHERE THE_RADIUS(round) > 2 OR THE_SIDE(cornered) > 2
is translated automatically to:
X WHERE (IS_CIRCLE(round) AND THE_RADIUS(round) > 2) OR
(IS_SQUARE(cornered) AND THE_SIDE(cornered) > 2)
and returns only tuples with either:
·
a circle value in 'round' for which the radius exceeds
2 (of course there would be no unity circles here, since their radius does not
satisfy the condition).
·
a square value in 'cornered' for which the side exceeds
2.
Y WHERE THE_BASE(figure) > 2
is translated automatically to:
Y WHERE (IS_RECTANGLE(figure) AND THE_BASE(figure) > 2)
and returns only tuples where the figure attribute is either
a rectangle or a square whose base (c.q. side) is larger than 2.
All that needs to be done is for the language interpreter
(whether that is a compiler or a true interpreter is irrelevant) to go and find
the "supermost" subtype (of the declared type of the attribute) for
which the operator used in the expression exists, and then, conceptually
speaking,
replace the operator in the expression with a boolean
expression of type (IS_T and operator_invocation_here) where T is the
applicable "supermost" subtype found.
For such an algorithm to be applicable, it is either required:
·
that such a "supermost subtype" must be
unique within the type (and within any of its own supertypes). i.e. SQUARES cannot have a radius, because
within the type PLANE_FIGURE, it is already defined for type CIRCLE.
·
absent such uniqueness, that the "expression
extension procedure" is prepared to find ALL the supermost subtypes for
which the operator is valid, and extend the expression to a form (IS_T1 OR
IS_T2 OR IS_Tn) and ... (where T1, T2, Tn would be the set of all types found).
Which of the two is desirable, I cannot tell, but I observe
that, if squares have radii (so to speak), then the semantics should at least
still be the same, because otherwise there would be two distinct types of
PLANE_FIGURE both having an operator of the same name (THE_RADIUS), the precise
meaning of which depends on the particular type of PLANE_FIGURE. This is ruled
out (once again, if I understood chapter 20 correctly, of course).
(Third remark : this need not affect the declared type of the
results, so TREAT_DOWN operators would still be needed. The only thing "eliminated" would be the need for the existence of the : shorthand.)
C. J. Date Responds: I don't have time to respond in
detail, except to say that I think the paragraph:
Which of the two is desirable, I cannot tell, but I observe
that, if squares have radii (so to speak), then the semantics should at least
still be the same, because otherwise there would be two distinct types of
PLANE_FIGURE both having an operator of the same name (THE_RADIUS), the precise
meaning of which depends on the particular type of PLANE_FIGURE. This is ruled
out (once again, if I understood chapter 20 correctly, of course).
is incorrect.
Let PF be a declared type PLANE_FIGURE. Does THE_CENTER(PF)
mean (IS_ELLIPSE(PF) AND ...) or (IS_RECTANGLE(PF) AND ...) ???
Hugh Darwen Responds: Regarding the possibility of
certain conditions being implied by the use of certain operators in certain
comparisons--for example THE_radius(C) > 1.0 being short for IS_CIRCLE(C)
AND THE_radius(C) > 1.0, I think the approach is extremely incautious and I
would not recommend such language design.
Some points that ES does not discuss and therefore might not
have considered carefully:
1. Using ES's
own example, consider relvar Y. How is "Y WHERE THE_center(figure) = POINT
( 0, 0 )" to be evaluated? I'm
assuming that THE_center is not defined for plane figures in general, even if
that was not ES's intention. My general point concerns operators that are "overloaded"
(i.e., have different semantics for different types of operand), even if ES did
not really intend THE_center to be overloaded.
2. Would
"EXTEND Y ADD THE_radius(figure) AS radius" be legal in ES's
approach? If so, there would presumably
have to be an implied restriction and perhaps an implied TREAT (as with our
":") too. In that case the
operation is no longer a true extension even though the operator is still
spelled EXTEND. If, on the other hand,
the example is illegal, then we have loss of orthogonality and it has to be
explained why THE_radius accepts an argument of declared type PLANEFIGURE in
some contexts and not others. Of course, restriction isn't the only place where
conditional expressions can be used.
Consider, for example, "EXTEND Y ADD (CASE WHEN THE_radius(figure)
> 1.0 THEN 'Yes'; ELSE 'No'; END CASE) AS radius_gt_1".
3. Taking ES's
argument to the extreme, an expression such as R WHERE X/Y = 10 would be
shorthand for (R WHERE Y <> 0) WHERE X/Y = 10, regardless of whether
NONZERO is a declared subtype of (e.g.) INTEGER. Are we to do away with run-time exceptions altogether?
4. In any
case, ES's longhands are not considered to be sound, in the language design
community at large (to the extent that I am familiar with that community). Note how I carefully wrote my example in point
3, using a nested WHERE rather than AND.
The expressions "R WHERE Y <> 0 AND X/Y = 10" and
"R WHERE X/Y = 10 AND Y <> 0" should both throw a zero-divide
exception. In general, the system should be free to evaluate the operands of
commutative operators such as AND in any order. Languages that specify, for example, left-to-right evaluation and
make the evaluation of Y <> 0 AND X/Y = 10 dependent on the system
stopping when Y <> 0 evaluates to false are strongly deprecated. Date and I certainly wouldn't want Tutorial
D to tread in such dangerous waters.
Nor do we want Tutorial D to invite criticism of possibly avant-garde
ideas that we consider to be irrelevant to our cause.
5. ES does not
tell us what the declared type of the relevant attribute is in, for example,
"Y WHERE THE_BASE(figure) > 2".
Is it PLANEFIGURE or RECTANGLE?
And what about "Y WHERE THE_BASE(figure) > 2 OR
THE_radius(figure) > 2"? If the
answer is just PLANEFIGURE in each case, then I would have to point out the
advantage of our ":" operator that is not being obtained in ES's
approach.
6. <ES
wrote> The proposed ':' operator has at least some vague resemblance to
"optimization hints from the user to the system", because it tells
the system what expressions (e.g. IS_CIRCLE) must be evaluated before which
others (e.g. THE_R). </ES wrote>
I emphatically reject the "optimizer hint"
characterisation! My remarks in points
4 and 5 are possibly relevant here. ":" has the effect of
specialising the declared type of one of the attributes of its operand.
I'm sure I could go on if I had the time, but I
hope—sincerely!—that what I've written will suffice to dissuade anybody from
pursuing the approach advocated by ES.
Posted 4/8/05