(Continued from Part 3)
The Liskov/Wing Inheritance Model
Continuing from Part 3, in this section, I want to sketch and
comment on certain features of the inheritance model that the Liskov/Wing paper
appears to espouse. Throughout what
follows, I'll assume that type S is supposed to be a subtype of type T. I'll also appeal from time
to time to a
trivial running example in which T is INT (integers) and S is EVEN (even
integers)--though I'm not at all sure that Liskov and Wing would agree with
this example (that is, I'm not sure they would allow the type "even
integers" to be regarded as a subtype of the type "integers in general"!). This is
clearly another point I'm going to
have to elaborate on later, and I will.
“First of all, then, S and T are allowed to have different
"value spaces"; for example, values of type INT might be decimal,
while values of type EVEN are hexadecimal.
In such a case, however, an abstraction function must be provided
to "relate" [sic] values of type S to values of type T.”
Comment: This point looks like a model vs.
implementation confusion to me. I would
have said, rather, that (for example) every even integer simply is an
integer; if for some reason we want to represent "even integers in
particular" differently from "integers in general," well, that's
an implementation matter, and it has nothing to do with the model. But object-oriented systems
usually seem not
to distinguish model and implementation properly. This first point is thus, perhaps, a consequence
of (as I put it
earlier) trying to formalize preexisting notions, instead of attempting to define
a brand new model.
Next, S is allowed to have more values than T does! Here's a quote (slightly
simplified):
"Consider pairs and triples. Pairs have methods that fetch
the first and second elements; triples have these methods plus an additional
one to fetch the third element. Triple is a subtype of pair" (my
emphasis).”
Comment: What? Is this another consequence of trying to
"formalize preexisting notions"?
Surely, to say that S is a subtype of T is to say that every value of
type S is a value of type T (i.e., the set of S values is a subset of the set
of T values). In other words, I agree
with Robert Martin's remark--or, at least, the general sense of his remark--to
the effect that "inheritance is the ISA relationship" (see Part 2 of
this series). Of course, Martin himself
made that remark only in order to be able to go on to refute it, but I reject
his refutation! Thus, to say that
"triple is a subtype of pair" is to say that every triple is a pair,
or equivalently that triples are a special case of pairs!
Now, in our own inheritance model, we do take it as axiomatic
that there can't be more values of type S than there are of type T (and we find
this apparently trivial observation a great aid to clear thinking in this
potentially confusing area).
Furthermore, it seems to me that Liskov and Wing's rejection of this
axiom constitutes in itself a violation of their own Subtype Requirement! For example, consider a
method that, given
an arbitrary object of some specific type T, returns the cardinality--i.e., the
number of distinct values--of that type T.
That method will clearly give different answers depending on whether
it's invoked on a pair or a triple.
(Equally clearly, such a method could be defined; I mean, I think the
example is legitimate.)
Next (to quote again):
"The subtype must provide all methods of its supertype; we
refer to these as the inherited methods."
(Note the tacit assumption
that a given subtype has just one supertype (more precisely, just one immediate
supertype-–see THE
THIRD MANIFESTO. Liskov and
Wing do say they allow multiple supertypes, but there's no serious discussion
of the possibility of multiple inheritance in the paper at all. In THE THIRD MANIFESTO, by
contrast,
we show that multiple inheritance is not only desirable, it's logically required
(i.e., single inheritance by itself makes little sense)--and we go on to
examine the implications of this fact in considerable detail.
Comment: Another
model vs. implementation confusion! (and another consequence of trying
to formalize preexisting notions?). I
mean, the statement doesn't seem to need saying, unless there might otherwise
have been some possibility that it wasn't true. Let me elaborate:
Ø
If S is a subtype of T, then any method that applies to
values of type T must apply to values of type S, because S values are
T values. For example, if DOUBLE is a
method that applies to integers in general, then certainly DOUBLE is a method
that applies to even integers in particular.
If it doesn't, then even integers aren't integers!
Ø
Of course, it's true that a method M that applies to
values of type T might be--might even need to be--reimplemented
for type S; in Part 1 of this article, I gave the example of a
"method" called AREA that applies to values of type POLYGON and
therefore to values of type RECTANGLE as well, but I also said that the method
would probably be reimplemented for type RECTANGLE under the covers, for
performance reasons. But the fact that
there are several implementation versions of an operator is indeed an
implementation issue, not a model issue; to the user, there's just one method. (If AREA applies to
polygons, it applies to
rectangles by definition--because if it doesn't, then rectangles aren't
polygons.)
By contrast, to reiterate, Liskov and Wing say that "The
subtype must provide all methods of its supertype." This remark can only mean that subtype
and
supertype versions of those methods are "provided" and are
user-visible--and indeed, that's exactly what happens in the
"bags-and-stacks" example (which I've already promised I'm going to
discuss in more detail in Part 6 of this series). What's more, since those different
implementation
versions are exposed in the model, it follows a fortiori that different
implementation version names are exposed, too. In the bags-and-stacks example, for
instance, there's a method
for adding a new element to the bag or stack, but it's called PUT for bags and
PUSH for stacks.
Next, values aren't required to be values of leaf types
specifically. In the case of integers
and even integers, for example, a value can indeed be just an integer. In other words, there's no
requirement that
another type ODD be defined, such that ODD and EVEN are both proper subtypes of
INT and every INT value is either an ODD value or an EVEN value. I agree with Liskov and Wing on
this point. (By the way, if type ODD were
defined, type INT would become a "virtual" type, or what THE THIRD
MANIFESTO calls a union type.
Such types are important, but the Liskov/Wing paper has little to say
about them--despite the fact that they do raise some rather interesting
questions in connection with substitutability.)
Next:
"[We allow] subtypes to have more methods than their
supertypes."
But if S doesn't have more methods than T, then what
was the point of defining it as a subtype in the first place? In other words, I want to say that S
must
have at least one method that isn't defined for T, because otherwise it isn't a
proper subtype.
Another quote:
"32-bit integers are not a subtype of 64-bit integers ...
because a user of 64-bit integers would expect certain method calls to succeed
that will fail when applied to 32-bit integers."
Comment: First, I'd
really prefer to ignore the stuff about 32 vs. 64 bits (it looks like
yet another model vs. implementation confusion); however, I suppose I
can't. Anyway: Presumably what the authors mean by their
use of such terms is that if type S consists of integers that can be
represented using 32 bits and if type T consists of integers that can be
represented using 64 bits, then S isn't a subtype of T. But I disagree with this claim,
strongly! Certainly every integer that
can be represented using 32 bits can also be represented using 64 bits, so
every value of type S is also a value of type T. (To say that "32-bit
integers aren't a subset of 64-bit integers" is very like saying that the
type "sets of no more than 50 integers" isn't a subtype of the type
"sets of no more than 100 integers."
Personally, I find this latter example even weirder than the one
involving 32- and 64-bit integers.
So what about that business of "certain method calls
succeeding on 64-bit integers but failing on 32-bit ones?" Presumably what the authors have
in mind
here is methods such as DOUBLE, where the result of doubling a 32-bit integer
might be too large to represent using only 32 bits. All right then:
Obviously, the result is of type "64-bit integer," not
"32-bit integer." What's the
problem? (In our model, the result will actually be of type "64-bit
integer" only if it's indeed too large to represent using 32 bits; otherwise
it'll be of type "32-bit integer."
See the subsequent discussion of specialization by constraint. )
Well ... Actually I need to pursue this example a little
further. In the previous paragraph, I
was tacitly assuming that DOUBLE was an "observer," not a
"mutator." In terms of Tutorial
D--the language we use to illustrate the ideas of THE THIRD MANIFESTO--DOUBLE
might look like this:
OPERATOR DOUBLE ( I INT64 ) RETURNS INT64 ;
RETURN 2 * I ;
END OPERATOR ;
When invoked, this operator--sorry, method--has no effect on
the argument corresponding to its sole parameter I. And note too that, thanks to LSP, that
argument can be of type
INT32 as well as INT64. (Of course, I'm
assuming here that INT32 and INT64 have the obvious semantics and that, pace
Liskov and Wing, INT32 is indeed a subtype of INT64.)
However, suppose we were to make DOUBLE a mutator
instead, thus:
OPERATOR DOUBLE ( I INT64 ) UPDATES I ;
I := 2
* I ;
END OPERATOR ;
When this revised DOUBLE operator is invoked, it definitely
does have an effect on the argument corresponding to its sole parameter I. And if that argument is
of type INT32, not
INT64, then the invocation might fail on an overflow error. (I really need to
be a bit more precise here. In our
model, the problem under discussion can occur only if the argument is of declared
type INT32. If its current most
specific type is INT32 but its declared type is INT64, then the problem
under discussion doesn't arise. See the
subsequent discussion of generalization by constraint.) Just as an aside, therefore, I'd
like to
point out that examples like this one can be seen as an argument--no pun
intended--against the idea of mutators in the first place.
Anyway, it's presumably because of such possibilities (e.g.,
the possibility that the DOUBLE mutator might give an overflow if invoked on an
INT32 "object" when it doesn't do so on a corresponding INT64
"object") that Liskov and Wing claim that INT32 isn't a
subtype of INT64. Instead, they say, in
effect, that we need to define a type INT consisting of all possible integers
and having two distinct proper subtypes INT32 and INT64, neither of which is a
subtype of the other. Then different
versions of DOUBLE--DOUBLE32 and DOUBLE64, say--can be defined, with different
preconditions (see Parts 5 for
a discussion of preconditions and postconditions), and the
problem goes away.
But do you see what's happened? We've been forced into defining what's surely a
rather strange
and counterintuitive type hierarchy, basically because the model doesn't
allow objects to change their type--as I'll now try to explain.
Note: I don't even want to get into all of the
complexities caused by the fact that the two subtypes INT32 and INT64--neither
one of which is a subtype of the other, remember--overlap, in the sense
that many integers (232 of them, to be precise) are values of both
types. Let me just observe that
"overlapping types" is yet another topic that our own model does
address (and deals with gracefully) that isn't even discussed in the
Liskov/Wing paper.
Let me switch to the simpler (?) example of types INT and
EVEN, and let's consider the DOUBLE method again. In our model, applying DOUBLE to a value of type
INT--regardless
of whether that value is also a value of type EVEN--will always return a result
of type EVEN, automatically. In other
words, our model supports what we call specialization by constraint,
which means, in the case at hand, that any INT value will automatically
be understood to be an EVEN value if it is in fact even. By contrast, I strongly suspect that
Liskov
and Wing would say that the result is merely of type INT--or, perhaps more
accurately, they would say that EVEN isn't a subtype of INT, precisely because
they don't support specialization by constraint and they don't want to deal
with the real-world fact that doubling any integer always returns an even
result (I must make it clear that their paper never spells these points out
explicitly; so far as I know, however, our own inheritance model is the only one
that does support specialization by constraint.)
Following on from the previous point: Suppose now that we have an
"object" O of type INT whose current value happens to be even (so in
our model "the current most specific type"--as opposed to "the
declared type" INT--of the object O is EVEN). And suppose we now increment the current value
of O by one. In our model, then, the current most
specific type of O is now just INT, not EVEN.
In other words, our model also supports what we call generalization
by constraint, which means, in the case at hand, that if the previous value
of O was even but the current value is not, then O is automatically
understood now to contain just an INT value, not an EVEN value. (Specialization by constraint and
generalization by constraint--hereinafter abbreviated to S by C and G by C,
respectively--go hand in hand. So far
as I know, our own inheritance model is the only one that supports either of
these concepts.)
Onward. My next point
is this: Liskov and Wing tacitly seem to support unconditional inheritance of mutators,
but they never discuss--in fact, they don't even mention--the logical
absurdities that are necessary consequences of such unconditional
inheritance. Here are some of those
consequences (for further discussion, see THE THIRD MANIFESTO):
·
What seem to be "pure retrieval" operations
can have the side-effect of updating the database.
·
Values of (e.g.) type SQUARE can have sides of
different lengths, thereby violating their own "squareness,"
undermining the database "as a model of reality," and causing
programs to produce nonsensical results such as "nonsquare
squares."
·
S by C and G by C aren't supported.
·
Type constraints aren't supported. (This one is fundamental! Type constraints are
the mechanism by which
legal values of the type in question are specified. Without type constraints, there can be no
check on the
correctness of values in the database at all. See my paper "Constraints and Predicates: A
Brief
Tutorial," published recently on this website.)
As THE THIRD MANIFESTOmakes clear, the common
thread running through all of these problems is a failure to make a clear
distinction between values and variables.
The last criticism I want to make concerning the Liskov and
Wing model--not the last one I have, but the last one I want to articulate
here--is that it fails to prescribe the semantics of equality! Rather, those semantics appear to
be
user-defined. In fact, it's not even
clear that every type has to have an equality method, though the idea of not
being able to tell whether two values of the same type are equal seems bizarre,
to say the least. (By the way, equality methods illustrate very well the point
that the idea of a distinguished parameter ("the method's object")
sometimes seems artificial in the extreme.)
Here's a quote:
“If objects of the subtype have additional state, x and y
may differ when considered as subtype objects but ought to be considered equal
when considered as supertype objects.”
And the authors give an example of two triples <0,0,0>
and <0,0,1> that are clearly unequal "but are equal if they're
considered just as pairs" (paraphrasing the original considerably).
Well, we've been here before. The idea that equal might be interpreted to
mean equal
ignoring certain differences is espoused in SQL today--for example, in the
rule that says that the two character strings
'AB' (length two)
and
'AB ' (length
three)
can be considered equal.
And this rule has led to endless complications--for example, over the
semantics of DISTINCT, and GROUP BY, and UNION, and many other operators (not
to mention additional rules regarding, e.g., what's legal in integrity
constraints and the like). Can't we
learn from past mistakes? If not, why
not?
Now, Liskov and Wing do go on to say (in the pairs and
triples example) that two different methods are needed, "pair_equal"
and "triple_equal," and of course that solves the problem in that
particular case. But the fact remains
that their model has no prescribed semantics for equality--not to mention the
fact that it was the idea that the subtype could have more values than the
supertype that caused the pair/triple problem in the first place.
Here's another quote that relates to the same point:
"The need for several equality methods seems natural for
realistic examples. For example, asking
whether e1 and e2 are the same person is different from asking if
they are the same employee. In the case
of a person holding two jobs, the answer might be true for the question about
person but false for the question about employee."
Well, I suppose they're thinking of employees as
<person,job> pairs; then two "employees" can be different and
yet involve the same person. Nothing
wrong with that. But what is
wrong is to think of two such "employees" that involve the same
person but different jobs as equal!
Rather, the question to which the answer is yes is not "Are
these two employees (i.e., <person,job> pairs) equal?" but "Do
these two employees (i.e., <person,job> pairs) involve the same
person?"
We're supposed to be talking about substitutability. Now, there's an extremely
important and
fundamental principle in logic that says that if a and b are
equal, then a can be substituted for b--or the other way around--in
absolutely any logical statement whatsoever (see, e.g., James D. McCawley, EVERYTHING
THAT LINGUISTS HAVE WANTED TO KNOW ABOUT LOGIC (BUT WERE ASHAMED TO ASK),
University of Chicago Press, 1981).
Clearly, the same is not true for two "employees" that involve
the same person but different jobs.
Thus, the model that Liskov and Wing describe appears to be violating an
absolutely crucial logical principle of substitutability.
As an aside, perhaps I should make it clear that Hugh Darwen
and I do know how to handle examples like "pairs and triples" or
"persons and jobs" in our own model; in particular, we know how to
get some code reuse in such examples--code reuse, after all, being one of the
benefits usually claimed for subtyping and inheritance. But we can achieve that reuse without
having
to pretend that every triple is a pair!
(Continued in Part 5)
Posted
08/11/02
[ABOUT]
[QUOTES]
[LINKS]