ON VIEW UPDATING
with C. J. Date and David McGoveran

 

 

 

From: ES

To: Editor

Date: 30 Jun 2004

 

I just (almost) finished reading AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed.  And I have a bit of a problem with view updateability. I would be pleased if I could get some clarifications.  I'll try and explain with an example.

 

- Let R1 be a Relation (Person, Quality), with the (obvious) semantics that every occurring tuple indicates that the "Person" in question has the "Quality" in question.  That is : the predicate is something along the lines of "-Person- has quality -Quality-.". - Let R2 be a similar Relation (Person, Function), with predicate "-Person-has function -Function-".

 

Tuples in R1 could e.g. indicate that "Chris Date is a smart person.", or "George Bush makes entertaining linguistic errors.".  Tuples in R2 could e.g. indicate that "Chris Date is a database lecturer", or "George Bush is the president of the United States.".

 

These two relations can be joined, yielding a view whose predicate would be something along the lines of "-Person- has quality -Quality- and fulfills function -Function-.".  One tuple's proposition in this view might be that (example) "Chris Date is a smart person and fulfills the function of database lecturing.".

 

Now suppose that Chris Date is tired of being misunderstood, and decides to retire.  Implying that Chris Date no longer has a function whatsoever.  A user might now observe that the proposition contained in his database (view), is no longer true.  He might want to correct this and (if the update is done through this join view) he can obviously only do so by deleting the tuple that is responsible for (now falsely) stating that "Chris Date is a smart person and fulfills the function of database lecturing.".  Now if I understand your proposed join-delete algorithm correctly, then the system should react by removing both the assertion regarding Chris Date's function *as well as* the assertion regarding Chris Date's quality???  Frankly, I can hardly believe this is for real ... Stop giving lectures does not imply stop being smart.  Stop being smart does not imply stop giving lectures (as practice sadly shows).  Stop being president does not imply stop making entertaining mistakes.  Stop making entertaining mistakes does not imply stop being president (maybe even quite the contrary).

 

More generally : deleting a tuple from a join (or intersect or difference for that matter) view, means in fact that the user is saying that "it is not true that both P(A) and P(B) are true." (where P(A) and P(B) are both propositions related to the relations (relations' tuples) that constitute the join).  Nothing more and nothing less.  To deduce from this fact that "both P(A) and P(B) are false" (which is indeed what we mean when we delete "the A part from A and the B part from B"), simply looks like "false reasoning" to me (well, at least in my school days it was labeled that way).  I therefore have come to believe that "view updateability" should be allowed to happen IF AND ONLY IF the system has PRECISELY ONE WAY to "resolve" the user's request.  You state that this is "highly desirable", but I believe that's not good enough.  I believe it should be a sine qua non.

 

I further observe that if a user is unaware of whether a relation is "material" ("base") or "virtual" ("view"), then he will presumably expect to be able to do both inserts and deletes (of tuples) in that relation. Rightfully so. But that implies that (under the aforementioned precisely-one-way-assumption), the system must have (or find) precisely one way to satisfy an insert request, and precisely one way as well to satisfy a delete request.  Lacking additional information, achieving this for both insert and delete is impossible.  An "insert" of a proposition which is a boolean "and" allows to deduce that all the individual parts must be true, but a delete of such a proposition has three possible "solutions".  A similar thing applies to a "delete" of a boolean "or" proposition.  An "and" is involved by the minus, intersect, join and restrict operators, an "or" is involved in the union operator.

 

Yet more generally : whatever the operator, it is inherently impossible to (unambiguously and in all cases) derive the pair of truth values (P(A),P(B)) from the sole truth value of the expression "P(A) operator P(B)" (the proposition of (a tuple in) the view).  The pair has four possible values, the expression two.  And the fact alone that the relations A and B have indeed been *individually* defined, implies that there is someone somewhere to whom these *individual* propositions are materially relevant.  Well, not necessarily so, but certainly very likely at least. If these "individual" propositions are then "randomly" decided by the system, then I think that system will be unacceptable to that other someone somewhere.  "Randomly", because, from that other user's point of view, I do not believe that any possibility (as to which pair P(A),P(B) will ultimately be picked by the system to make the database reflect the desired "P(A) operator P(B)" value) can be expected to be objectively preferable to any of the others.  In the example I began with : it can not be excluded that there is another user somewhere who is interested solely in person's qualities, and it can furthermore not be excluded that this other user is still interested in a person's qualities, even if that person no longer has a function, or if the fact that this person indeed has a function has become irrelevant to the "first" user.

 

I also briefly elaborated the example of an insert into a view A XOR B ((A MINUS B) UNION (B MINUS A)).  And hurray, the proposed scheme seems to "correctly" insert my tuple into either A or B, but not both, which indeed complies with the definition of XOR.  But.  If the "left" MINUS is treated first, the tuple ends up in B, otherwise in A.  Now this is obviously implementation-dependent.  Differences in optimization algorithms might lead to one DBMS taking the decision "left first", and another one "right first" Worse, it may even try and do so simultaneously in two distinct threads.  Such a situation is even justifiable due to the symmetric nature of the XOR/UNION/OR operator.  The DBMSs themselves may be consistent in the choice they make, and thus expose "predictable" ("deterministic") behavior, but a user can still end up with programs who suddenly start behaving differently when the organization changes its DBMS (say, from DB/3 to Miracle).  Which seems to me like yet another completely undesirable "side-"effect.

 

Yet another objection that I felt is with the restriction operator.  If a user does an insert (of a tuple) in a relation, he will afterwards certainly expect that that tuple be returned if he queries that relation (with selection criteria that match his specific tuple, of course).  Now first of all, it is explicitly stated that this is not necessarily the case.  If a view is defined on Parts where Color=Red (and the Color attribute is also projected away from the view), then an insert into the view will not result in an insert of a Part whose Color attribute equals Red.  Such behavior "betrays" to the user that his relation is actually not behaving like a base relation, and therefore must be a view.  But this is in fact undesirable because we do not want that distinction (between base relations and views) to show in the first place.  Furthermore: 'deriving' the applicable color from the view definition (i.e. inserting a Part whose Color attribute equals Red), is also a "bad" idea, because such derivation process cannot be guaranteed to be "uniquely resolvable" in all cases (e.g. Parts where Color=Red or Color=Blue - so which color should it be ?).

 

And as for projection : if the color is projected away from relation 'Parts', then a view is produced whose predicate would be a certain statement regarding 'parts', followed by '... and it has SOME color' (ref. one of the introductory chapters).  A user doing an insert in this view, is actually making a proposition which says about this certain part '... and it has SOME color'.  To which can be added (without change of meaning): '... but that color to me is UNKNOWN.'.  Now if we step over to the design guidelines regarding the treatment of 'the unknown', then there we find that if a designer encounters such propositions where some parts (of the proposition, that is) may be 'unknown', then that should be an indication for this designer to split this relation in several pieces, right up to the point where there are no more possible statements of something being unknown.  In other words (my interpretation) : if a requirement is detected for an insert into a projection view, then that should be an indication that the projection should actually be the base relation and not the view, and the "unprojected" relation should be a (join) view and not the base relation.

 

I am therefore more inclined to conclude that "general view updateability" is its own monster of Loch Ness : much sought after, never found – and indeed undefinable. That is to say : it may be perfectly possible to find a scheme that satisfies the conditions of "consistent and predictable behavior", as well as "logically correct behavior (at least as far as just the view itself is concerned)".  But it seems impossible to make such a scheme also "logically justifiable", by which I mean that whatever behavior the scheme exposes, it can be proven to each and every individual other user of that database that this behavior is the *only* logically correct option.

 

OTOH, all of this is in fact so plain to see that I cannot imagine such issues not having been raised before.  And that there must therefore be arguments to the contrary readily available.

 

 

C. J. Date Responds:  First of all, let me say that I no longer regard view updating as a fully solved problem.  A year ago or so I thought it was--but then Hugh Darwen started to ask some hard questions and I realized I was wrong.  (David McGoveran will probably disagree with me here.) 

 

That said, I remain optimistic that the problem is solvable.  I also think the discussion in AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed., is generally along the right lines, though it gets some of the details wrong.  I currently plan to include an extended discussion of the issue in the 3rd edition of our book on THE THIRD MANIFESTO, which is due from Addison-Wesley sometime next spring under the tentative new title DATABASES, TYPES, AND THE RELATIONAL MODEL

 

One important piece of the puzzle that wasn't mentioned in the 8th edition is what we call The Assignment Principle, which can be stated thus: 

 

Following assignment of v to V, the comparison v = V must evaluate to TRUE.

 

Hugh and I articulated this principle in our recent PRACTICAL DATABASE FOUNDATIONS paper #5 Multiple Assignment.  As we said in that paper, the principle is so obvious (even trivial) that it might seem hardly worth stating, let alone dignifying with such a grand name.  Indeed, it's effectively the definition of assignment. But we also observed in that paper that it's violated ubiquitously in SQL in particular, and it's tempting to suggest that such violations occur, at least in part, precisely because the principle hasn't been properly spelled out before. 

 

Now let me address some of the specific points you raise. The first has to do with updating joins.  For reasons that should be obvious I don't want to use your specific example here; let me switch to the well-known suppliers-and-parts example, as in the 8th edition.  Let SSP be the join of relvars S and SP over S# (a one-to-many join; your example was many-to-many, and I'll get to that case in a moment).  Consider an attempt to delete just the tuple for supplier S1 and part P1 from SSP.  The rules say:  Delete the tuple for S1 from relvar S and delete the tuple for S1 and P1 from relvar SP.  So what happens?  There are three possibilities: 

 

1.      If there are no other shipments for supplier S1, the overall operation succeeds. 

 

2.      If there are some other shipments for supplier S1 and no other updates are done (i.e., there are no side effects), the overall operation fails on a referential integrity violation. 

 

3.      If there are some other shipments for supplier S1 and deleting supplier S1 from relvar S "cascades" to delete those shipments from relvar SP, the overall operation fails on a violation of The Assignment Principle.  

 

In other words, the operation

 

DELETE SSP WHERE S# = S#('S1') AND P# = P#('P1') ;

 

is inherently unsafe, since we presumably don't know, in general, whether there are any other shipments for supplier S1.  The following operation, by contrast, is safe:  

 

DELETE SSP WHERE S# = S#('S1') ;

 

(it will delete supplier S1 from S and all shipments for supplier S1 from SP). 

 

Now, you say, correctly (though I paraphrase), that to infer (NOT PA) AND (NOT PB) from NOT (PA AND PB) is "false reasoning."  I agree.  But I never claimed--at least, I don't think I ever claimed!--that the rule for deleting from a join was based purely on logical inference.  However, I do claim that (a) it's a simple rule, (b) it's a reasonable rule, (c) it gives useful and expected results in the majority of cases, and (d) it gives predictable results in all cases.  I also claim that the rule is both (e) symmetric in itself and (f) symmetric with respect to the corresponding insert rule.  Finally, I claim (g) it's desirable and indeed advantageous to have a rule that doesn't depend on whether the join is one-to-one, one-to-many, or many-to-many.  (Which takes care of the many-to-many case as promised.) 

 

You also say, correctly (though I paraphrase again), that there are three possible solutions to the equation NOT (PA AND PB) = TRUE: viz., PA = TRUE or PB = TRUE or both. Again, I agree.  But having to solve an underspecified set of equations is a situation we often face, in scientific contexts as well as nonscientific ones.  Linear programming provides a classic example in the scientific arena; in linear programming, we're given m linear equations in n unknowns (m < n), and we try to find a solution--explicitly recognized as not necessarily unique--to those equations that satisfies some pragmatic condition.  (Of course, the pragma in question needs to be well thought out.)  So I would say in the view updating context that we have some pragmatic conditions that need to be satisfied, and those conditions are, or at least include, those listed in the 8th edition (e.g., symmetry).* 

 

* One reason symmetry is desirable is this:  We want the semantics of A JOIN B to be the same as those of B JOIN A.  Likewise for A UNION B, of course. 

 

Another point of pragma that's relevant is this: Obviously, we can--and, I presume, would--use the system's security mechanism to prohibit updates on views that, if permitted, would have effects we think are undesirable in some way.  In your example, you might be well advised not to give the user delete privileges on the join of "Person has Quality" and "Person has Function," if you're worried in some way about the effect such deletes might have. 

 

And another:  Of course, none of the foregoing imposes any limits on what applications can do.  To revert to suppliers and parts, there's nothing to stop an application program from (a) displaying the join of suppliers and shipments as a single table on the screen, (b) allowing the end user to remove a row from that table somehow, and (c) implementing that removal by deleting a tuple from the shipments relvar (only) under the covers.  But any suggestion that "removing a row" in this fashion is a relational DELETE on the join should be avoided:  It is a different operation (and the user would need to understand that fact, in general), it has different semantics, and it should be given a different name. 

 

 

I think I've now responded to many of your points, directly or indirectly, but there are still some outstanding issues. 

 

You say the 8th edition says it's "highly desirable" for there to be precisely one way to "resolve" the user's request.  Actually I couldn't find any remark to this effect.  Can you cite chapter and verse?  Because I might want to delete the remark, if it's there! 

 

Ø       Paraphrasing again, you say that deleting a tuple from a difference means that tuple satisfies NOT (PA AND PB).  Actually this statement is incorrect.  The predicate for V = A MINUS B is FORALL t e V (PA(t) AND NOT (t e B)).  The 8th edition is wrong on this too (sigh). 

 

Ø       Your implied remark re deleting through union is incorrect as well (DELETE on A UNION B must delete from both).  It's INSERT, not DELETE, that might be argued to be ambiguous in the case of union. 

 

Ø       The symmetric difference example ("A XOR B"):  Once again I fear the 8th edition is incorrect.  Without going into details, let me just say that the insert rules as currently defined lead to a violation of The Assignment Principle in this case--in part because of the very lack of symmetry that (as you point out) might otherwise occur. 

 

Ø       Restriction:  If tuple t is inserted into V, the result of evaluating the expression V ("SELECT * FROM V" if you prefer) will include t.  No arguments!--this state of affairs follows from The Assignment Principle, and it applies to all relvars V, be they restriction views or whatever.  In any case, your objection here really has to do with projection, not restriction ... See next. 

 

Ø       Projection:  I'm sympathetic to your observations in this connection.  Hugh Darwen and I have both felt for some time that all base relvars should be in 6NF.*  Note, however, that such a discipline doesn't solve the problem!--consider the view V = (A JOIN B){attributes of A}.  But default values are certainly not the whole of the solution to that problem, either.  As I wrote in my recent response to Dave Tallman:  "[The] idea that a given attribute has just one default that must be used in all contexts is ... too rigid.  Further investigation is needed, however, and further discussion is beyond the scope of these notes." 

 

* Not a typo. 

 

I'd like to close with the following remark.  One of the epigraphs to THE THIRD MANIFESTO book (all editions) includes the following:  "[Logical] differences are not everything, and nor is logic."  We do not claim, nor would we want to claim, that everything in the MANIFESTO is based on logic or logical inference alone.  For example, the prescription to the effect that update operators not return a value is based, at least in part, on pragmatic considerations; the same goes for the decision to exclude operators for creating and destroying databases; likewise for the prescription--or lack of one, rather--regarding recursively defined types; likewise for the prescriptions concerning tuple and relation type names; likewise for RM Prescription 26; and so on, probably.

 

 

David McGoveran Comments:  Without writing a paper, I do want to make a few comments for publication in response to the objections to view updateability, at least insofar as it addresses the approach proposed by Chris Date and myself. A detailed response is not possible for me at this time. Throughout, I've placed quotes around certain words or phrases in lower case, because they could use some further explanation, might be over-interpreted, or otherwise are imprecise. I apologize in advance for not being able to do better by this important subject here.

 

Let me note up front that Chris and I may not see eye-to-eye regarding view updateability if his published expositions are indicative. I suspect this divergence is largely a matter of some details of the approach which Chris more or less downplays, and which then permit many confusions (and objections) to arise whether by Hugh Darwen, the recent correspondent, or others.

 

Unfortunately I have not been in a position to pursue clarification with Chris, so please do not take this any kind of indictment.

 

View updateability is of crucial importance to the relational model. If there is no solution, then there is likewise no possibility of data independence (both physical and logical!). That would be tantamount to disclaiming the relational model itself insofar as data independence was the stated initial motivation of Dr. Codd's relational model research program.

 

I believe that a solution to the view updateability "problem" is not only possible, but that it is a solved problem.

 

I do not believe that the problem has been formulated well in the literature, nor do I believe that the solution has been well and completely explained. Furthermore, I believe that the consequences of understanding and applying the solution are wide-reaching, so much so that we might have to acknowledge that we never truly understood the relational model heretofore.

 

As much as I would dearly love to have the resources to write a complete explanation, I do not. Instead, I will have to be satisfied with providing a few principles, recommended conceptualizations, or guideposts.

 

1. The "Date-McGoveran view update approach" (or, in more concrete form, the algorithm) provides a set of rules for applying updates to any relation.

 

2. These rules are not to be applied in isolation. They are, in a sense, context dependent, where the context is supplied by constraints accessible to the DBMS and by the current user's intent as expressible through (a certain strictly constructive and strictly finite) first order predicate logic. The simple expression of the rules only tell us what to do in the absence of further contextual information.

 

3. A DBMS that supports the relational model cannot be "magic." It cannot compensate for ambiguity,  intuit withheld information or assumptions, or correct expressions that violate the designer's or user's intent. If the designer fails to capture information in database design, if information is hidden from the user, or if the user incompetently fails to express their intent [Ed. Note: Or if the data language is not sufficiently expressive], it will certainly produce seemingly anomalous or "surprising" results when faced with these problems.

 

4. No logical data independence mechanism can be permitted to hide relation semantics from users (the contrary would be absurd), although it seems that many writers have assumed that to be part of its intent. The semantics of a derived view is inherently distinct from that of a "base relation." For example, the relvar predicate for a join view necessarily involves multiple irreducible predicates, whereas that for a "base relation" should not.

 

5. Logical data independence can hide "base" data and "base" data representation. For example, a projection view can hide the specific data found in the "lost" attributes even though the relvar predicate clearly tells us that those attributes, and data in them, must exist. Additionally, the view might not then contain any evidence of dependencies among those attributes and attributes in other relations.

 

6. For logical data independence to be possible, the semantics of relational operators must be consistent, whether the operands are derived relations or "base" relations. Although we have traditionally thought of this as requiring that we find some algorithm for view updating, this is a poor method of attack, which seems obvious when we consider that derived relations necessarily have the more complex semantics. Thus, we must solve the general case of update semantics for which updating "base" relations is the special case (and not the reverse).

 

7. All relational updates should be understood as set transformations: Therefore, they are atomic, and the final "state" is not dependent on the order of any intermediate steps. Just as is the case when evaluating a logical expression (e.g., a tautology or a relvar predicate), there are no implementation dependencies (as, for example, in treating the left vs. right "MINUS" first) -- only implementation errors. An optimizer incorporating such implementation dependencies is faulty.

 

8. We must avoid the temptation to think of a relational update as anything other than a declarative specification of a resulting relation (a set) incorporating the relvar predicate of source relation and its current value. Nothing else is pertinent to computing the value that is to be "assigned" to the relvar. If a reference to one or more tuples occurs within the update expression, we should not think even of these as anything other than a particular way of expressing a constraint (i.e., as part of an "update predicate").

 

9. Correct database design is crucial for understanding view updateability (Chris Date will possibly demur on this point), and goes well beyond normalization or even our Principle of (Semantically) Orthogonal Design. True, the results will always be predictable and deterministic, but are likely to produce confusing side effects from time-to-time.

 

With these principles in mind, reconsider the "problem" with the example of delete through a join view. The "betrayal" of the user is an illusion and ceases to be a problem. In fact, it is a problem which we accept all the time for "base" relations: We can easily construct a database of "base" relations for which there are "side effects" (remember cascade delete?) or which prevent the user from obtaining update results that behave as if the target relation exists in isolation.

 

The "problem" of finding a unique inverse transformation for a join view in a given context without specifying any context goes away. Clearly, the delete must satisfy 'NOT PA OR NOT PB'. (Aside: We must be very careful what we mean by NOT, avoiding confusion with simple complement. Relations have a relative complement - tuples not asserted to be 'true' - with a scope that is distinct from that of the relvar predicate - tuples that cannot belong to the relation.) This requirement may be further constrained in at least one of two ways. First, if they know, the user should specify their intent as to why the conjunction is 'false'. Second, some of the integrity rules and dependencies the designer imposes may further constrain the general requirement.

 

Of necessity, I leave the detailed analysis of other view updateability "problems" to the reader as an exercise in applying the principles I've stated above. My hope is that they will encourage the reader to reframe both the problem and the solution.

 

 

Posted 09/14/04