WHAT DOES SUBSTITUTABILITY REALLY MEAN? PART 4
by Chris Date

 

 

 

(Continued from Part 3)

 

 

The Liskov/Wing Inheritance Model

 

Continuing from Part 3, in this section, I want to sketch and comment on certain features of the inheritance model that the Liskov/Wing paper appears to espouse.  Throughout what follows, I'll assume that type S is supposed to be a subtype of type T.  I'll also appeal from time to time to a trivial running example in which T is INT (integers) and S is EVEN (even integers)--though I'm not at all sure that Liskov and Wing would agree with this example (that is, I'm not sure they would allow the type "even integers" to be regarded as a subtype of the type "integers in general"!).  This is clearly another point I'm going to have to elaborate on later, and I will. 

 

“First of all, then, S and T are allowed to have different "value spaces"; for example, values of type INT might be decimal, while values of type EVEN are hexadecimal.  In such a case, however, an abstraction function must be provided to "relate" [sic] values of type S to values of type T.” 

 

Comment: This point looks like a model vs. implementation confusion to me.  I would have said, rather, that (for example) every even integer simply is an integer; if for some reason we want to represent "even integers in particular" differently from "integers in general," well, that's an implementation matter, and it has nothing to do with the model.  But object-oriented systems usually seem not to distinguish model and implementation properly.  This first point is thus, perhaps, a consequence of (as I put it earlier) trying to formalize preexisting notions, instead of attempting to define a brand new model. 

 

Next, S is allowed to have more values than T does!  Here's a quote (slightly simplified):

 

"Consider pairs and triples. Pairs have methods that fetch the first and second elements; triples have these methods plus an additional one to fetch the third element. Triple is a subtype of pair" (my emphasis).” 

 

Comment:  What?  Is this another consequence of trying to "formalize preexisting notions"?  Surely, to say that S is a subtype of T is to say that every value of type S is a value of type T (i.e., the set of S values is a subset of the set of T values).  In other words, I agree with Robert Martin's remark--or, at least, the general sense of his remark--to the effect that "inheritance is the ISA relationship" (see Part 2 of this series).  Of course, Martin himself made that remark only in order to be able to go on to refute it, but I reject his refutation!  Thus, to say that "triple is a subtype of pair" is to say that every triple is a pair, or equivalently that triples are a special case of pairs! 

 

Now, in our own inheritance model, we do take it as axiomatic that there can't be more values of type S than there are of type T (and we find this apparently trivial observation a great aid to clear thinking in this potentially confusing area).  Furthermore, it seems to me that Liskov and Wing's rejection of this axiom constitutes in itself a violation of their own Subtype Requirement!  For example, consider a method that, given an arbitrary object of some specific type T, returns the cardinality--i.e., the number of distinct values--of that type T.  That method will clearly give different answers depending on whether it's invoked on a pair or a triple.  (Equally clearly, such a method could be defined; I mean, I think the example is legitimate.) 

 

Next (to quote again): 

 

"The subtype must provide all methods of its supertype; we refer to these as the inherited methods." 

 

(Note the tacit assumption that a given subtype has just one supertype (more precisely, just one immediate supertype-–see THE THIRD MANIFESTO.  Liskov and Wing do say they allow multiple supertypes, but there's no serious discussion of the possibility of multiple inheritance in the paper at all.  In THE THIRD MANIFESTO, by contrast, we show that multiple inheritance is not only desirable, it's logically required (i.e., single inheritance by itself makes little sense)--and we go on to examine the implications of this fact in considerable detail. 

 

Comment:  Another model vs. implementation confusion! (and another consequence of trying to formalize preexisting notions?).  I mean, the statement doesn't seem to need saying, unless there might otherwise have been some possibility that it wasn't true.  Let me elaborate: 

 

Ø       If S is a subtype of T, then any method that applies to values of type T must apply to values of type S, because S values are T values.  For example, if DOUBLE is a method that applies to integers in general, then certainly DOUBLE is a method that applies to even integers in particular.  If it doesn't, then even integers aren't integers! 

 

Ø       Of course, it's true that a method M that applies to values of type T might be--might even need to be--reimplemented for type S; in Part 1 of this article, I gave the example of a "method" called AREA that applies to values of type POLYGON and therefore to values of type RECTANGLE as well, but I also said that the method would probably be reimplemented for type RECTANGLE under the covers, for performance reasons.  But the fact that there are several implementation versions of an operator is indeed an implementation issue, not a model issue; to the user, there's just one method.  (If AREA applies to polygons, it applies to rectangles by definition--because if it doesn't, then rectangles aren't polygons.) 

 

By contrast, to reiterate, Liskov and Wing say that "The subtype must provide all methods of its supertype."  This remark can only mean that subtype and supertype versions of those methods are "provided" and are user-visible--and indeed, that's exactly what happens in the "bags-and-stacks" example (which I've already promised I'm going to discuss in more detail in Part 6 of this series).  What's more, since those different implementation versions are exposed in the model, it follows a fortiori that different implementation version names are exposed, too.  In the bags-and-stacks example, for instance, there's a method for adding a new element to the bag or stack, but it's called PUT for bags and PUSH for stacks. 

 

Next, values aren't required to be values of leaf types specifically.  In the case of integers and even integers, for example, a value can indeed be just an integer.  In other words, there's no requirement that another type ODD be defined, such that ODD and EVEN are both proper subtypes of INT and every INT value is either an ODD value or an EVEN value.  I agree with Liskov and Wing on this point.  (By the way, if type ODD were defined, type INT would become a "virtual" type, or what THE THIRD MANIFESTO calls a union type.  Such types are important, but the Liskov/Wing paper has little to say about them--despite the fact that they do raise some rather interesting questions in connection with substitutability.) 

 

Next:

"[We allow] subtypes to have more methods than their supertypes." 

 

But if S doesn't have more methods than T, then what was the point of defining it as a subtype in the first place?  In other words, I want to say that S must have at least one method that isn't defined for T, because otherwise it isn't a proper subtype. 

 

Another quote

 

"32-bit integers are not a subtype of 64-bit integers ... because a user of 64-bit integers would expect certain method calls to succeed that will fail when applied to 32-bit integers." 

 

Comment:  First, I'd really prefer to ignore the stuff about 32 vs. 64 bits (it looks like yet another model vs. implementation confusion); however, I suppose I can't.  Anyway:  Presumably what the authors mean by their use of such terms is that if type S consists of integers that can be represented using 32 bits and if type T consists of integers that can be represented using 64 bits, then S isn't a subtype of T.  But I disagree with this claim, strongly!  Certainly every integer that can be represented using 32 bits can also be represented using 64 bits, so every value of type S is also a value of type T. (To say that "32-bit integers aren't a subset of 64-bit integers" is very like saying that the type "sets of no more than 50 integers" isn't a subtype of the type "sets of no more than 100 integers."  Personally, I find this latter example even weirder than the one involving 32- and 64-bit integers. 

 

So what about that business of "certain method calls succeeding on 64-bit integers but failing on 32-bit ones?"  Presumably what the authors have in mind here is methods such as DOUBLE, where the result of doubling a 32-bit integer might be too large to represent using only 32 bits.  All right then:  Obviously, the result is of type "64-bit integer," not "32-bit integer."  What's the problem? (In our model, the result will actually be of type "64-bit integer" only if it's indeed too large to represent using 32 bits; otherwise it'll be of type "32-bit integer."  See the subsequent discussion of specialization by constraint. )

 

Well ... Actually I need to pursue this example a little further.  In the previous paragraph, I was tacitly assuming that DOUBLE was an "observer," not a "mutator."  In terms of Tutorial D--the language we use to illustrate the ideas of THE THIRD MANIFESTO--DOUBLE might look like this: 

 

OPERATOR DOUBLE ( I INT64 ) RETURNS INT64 ;

 RETURN 2 * I ;

END OPERATOR ;

 

When invoked, this operator--sorry, method--has no effect on the argument corresponding to its sole parameter I.  And note too that, thanks to LSP, that argument can be of type INT32 as well as INT64.  (Of course, I'm assuming here that INT32 and INT64 have the obvious semantics and that, pace Liskov and Wing, INT32 is indeed a subtype of INT64.) 

 

However, suppose we were to make DOUBLE a mutator instead, thus: 

 

OPERATOR DOUBLE ( I INT64 ) UPDATES I ;

 I  :=  2 * I ;

END OPERATOR ;

 

When this revised DOUBLE operator is invoked, it definitely does have an effect on the argument corresponding to its sole parameter I.  And if that argument is of type INT32, not INT64, then the invocation might fail on an overflow error. (I really need to be a bit more precise here.  In our model, the problem under discussion can occur only if the argument is of declared type INT32.  If its current most specific type is INT32 but its declared type is INT64, then the problem under discussion doesn't arise.  See the subsequent discussion of generalization by constraint.)  Just as an aside, therefore, I'd like to point out that examples like this one can be seen as an argument--no pun intended--against the idea of mutators in the first place. 

 

Anyway, it's presumably because of such possibilities (e.g., the possibility that the DOUBLE mutator might give an overflow if invoked on an INT32 "object" when it doesn't do so on a corresponding INT64 "object") that Liskov and Wing claim that INT32 isn't a subtype of INT64.  Instead, they say, in effect, that we need to define a type INT consisting of all possible integers and having two distinct proper subtypes INT32 and INT64, neither of which is a subtype of the other.  Then different versions of DOUBLE--DOUBLE32 and DOUBLE64, say--can be defined, with different preconditions (see Parts 5 for a discussion of preconditions and postconditions), and the problem goes away. 

 

But do you see what's happened?  We've been forced into defining what's surely a rather strange and counterintuitive type hierarchy, basically because the model doesn't allow objects to change their type--as I'll now try to explain. 

 

Note: I don't even want to get into all of the complexities caused by the fact that the two subtypes INT32 and INT64--neither one of which is a subtype of the other, remember--overlap, in the sense that many integers (232 of them, to be precise) are values of both types.  Let me just observe that "overlapping types" is yet another topic that our own model does address (and deals with gracefully) that isn't even discussed in the Liskov/Wing paper. 

 

Let me switch to the simpler (?) example of types INT and EVEN, and let's consider the DOUBLE method again.  In our model, applying DOUBLE to a value of type INT--regardless of whether that value is also a value of type EVEN--will always return a result of type EVEN, automatically.  In other words, our model supports what we call specialization by constraint, which means, in the case at hand, that any INT value will automatically be understood to be an EVEN value if it is in fact even.  By contrast, I strongly suspect that Liskov and Wing would say that the result is merely of type INT--or, perhaps more accurately, they would say that EVEN isn't a subtype of INT, precisely because they don't support specialization by constraint and they don't want to deal with the real-world fact that doubling any integer always returns an even result (I must make it clear that their paper never spells these points out explicitly; so far as I know, however, our own inheritance model is the only one that does support specialization by constraint.) 

 

Following on from the previous point:  Suppose now that we have an "object" O of type INT whose current value happens to be even (so in our model "the current most specific type"--as opposed to "the declared type" INT--of the object O is EVEN).  And suppose we now increment the current value of O by one.  In our model, then, the current most specific type of O is now just INT, not EVEN.  In other words, our model also supports what we call generalization by constraint, which means, in the case at hand, that if the previous value of O was even but the current value is not, then O is automatically understood now to contain just an INT value, not an EVEN value.  (Specialization by constraint and generalization by constraint--hereinafter abbreviated to S by C and G by C, respectively--go hand in hand.  So far as I know, our own inheritance model is the only one that supports either of these concepts.) 

 

Onward.  My next point is this: Liskov and Wing tacitly seem to support unconditional inheritance of mutators, but they never discuss--in fact, they don't even mention--the logical absurdities that are necessary consequences of such unconditional inheritance.  Here are some of those consequences (for further discussion, see THE THIRD MANIFESTO): 

 

·   What seem to be "pure retrieval" operations can have the side-effect of updating the database. 

·   Values of (e.g.) type SQUARE can have sides of different lengths, thereby violating their own "squareness," undermining the database "as a model of reality," and causing programs to produce nonsensical results such as "nonsquare squares." 

·   S by C and G by C aren't supported. 

·   Type constraints aren't supported.  (This one is fundamental!  Type constraints are the mechanism by which legal values of the type in question are specified.  Without type constraints, there can be no check on the correctness of values in the database at all.  See my paper "Constraints and Predicates: A Brief Tutorial," published recently on this website.) 

 

As THE THIRD MANIFESTOmakes clear, the common thread running through all of these problems is a failure to make a clear distinction between values and variables.  

 

The last criticism I want to make concerning the Liskov and Wing model--not the last one I have, but the last one I want to articulate here--is that it fails to prescribe the semantics of equality!  Rather, those semantics appear to be user-defined.  In fact, it's not even clear that every type has to have an equality method, though the idea of not being able to tell whether two values of the same type are equal seems bizarre, to say the least. (By the way, equality methods illustrate very well the point that the idea of a distinguished parameter ("the method's object") sometimes seems artificial in the extreme.) 

Here's a quote: 

 

“If objects of the subtype have additional state, x and y may differ when considered as subtype objects but ought to be considered equal when considered as supertype objects.” 

 

And the authors give an example of two triples <0,0,0> and <0,0,1> that are clearly unequal "but are equal if they're considered just as pairs" (paraphrasing the original considerably). 

 

Well, we've been here before.  The idea that equal might be interpreted to mean equal ignoring certain differences is espoused in SQL today--for example, in the rule that says that the two character strings

 

    'AB' (length two)

 

and

 

    'AB ' (length three)

 

can be considered equal.  And this rule has led to endless complications--for example, over the semantics of DISTINCT, and GROUP BY, and UNION, and many other operators (not to mention additional rules regarding, e.g., what's legal in integrity constraints and the like).  Can't we learn from past mistakes?  If not, why not? 

 

Now, Liskov and Wing do go on to say (in the pairs and triples example) that two different methods are needed, "pair_equal" and "triple_equal," and of course that solves the problem in that particular case.  But the fact remains that their model has no prescribed semantics for equality--not to mention the fact that it was the idea that the subtype could have more values than the supertype that caused the pair/triple problem in the first place. 

 

Here's another quote that relates to the same point: 

 

"The need for several equality methods seems natural for realistic examples.  For example, asking whether e1 and e2 are the same person is different from asking if they are the same employee.  In the case of a person holding two jobs, the answer might be true for the question about person but false for the question about employee." 

 

Well, I suppose they're thinking of employees as <person,job> pairs; then two "employees" can be different and yet involve the same person.  Nothing wrong with that.  But what is wrong is to think of two such "employees" that involve the same person but different jobs as equal!  Rather, the question to which the answer is yes is not "Are these two employees (i.e., <person,job> pairs) equal?" but "Do these two employees (i.e., <person,job> pairs) involve the same person?" 

 

We're supposed to be talking about substitutability.  Now, there's an extremely important and fundamental principle in logic that says that if a and b are equal, then a can be substituted for b--or the other way around--in absolutely any logical statement whatsoever (see, e.g., James D. McCawley, EVERYTHING THAT LINGUISTS HAVE WANTED TO KNOW ABOUT LOGIC (BUT WERE ASHAMED TO ASK), University of Chicago Press, 1981).  Clearly, the same is not true for two "employees" that involve the same person but different jobs.  Thus, the model that Liskov and Wing describe appears to be violating an absolutely crucial logical principle of substitutability. 

 

As an aside, perhaps I should make it clear that Hugh Darwen and I do know how to handle examples like "pairs and triples" or "persons and jobs" in our own model; in particular, we know how to get some code reuse in such examples--code reuse, after all, being one of the benefits usually claimed for subtyping and inheritance.  But we can achieve that reuse without having to pretend that every triple is a pair! 

 

 

(Continued in Part 5)

 

Posted 08/11/02

 

 

 

[ABOUT] [QUOTES] [LINKS]