From: ES
To: Editor
Date: 30 Jun 2004
I just (almost) finished reading AN INTRODUCTION TO DATABASE
SYSTEMS, 8th Ed. And I have a bit of a problem with
view updateability. I would be pleased if I could get some
clarifications. I'll try and explain with an example.
- Let R1 be a Relation (Person, Quality), with the (obvious)
semantics that every occurring tuple indicates that the "Person" in
question has the "Quality" in question. That is : the predicate is something along the lines of
"-Person- has quality -Quality-.". - Let R2 be a similar Relation
(Person, Function), with predicate "-Person-has function -Function-".
Tuples in R1 could e.g. indicate that "Chris Date is a
smart person.", or "George Bush makes entertaining linguistic
errors.". Tuples in R2 could e.g.
indicate that "Chris Date is a database lecturer", or "George
Bush is the president of the United States.".
These two relations can be joined, yielding a view whose
predicate would be something along the lines of "-Person- has quality
-Quality- and fulfills function -Function-.". One tuple's proposition in this view might be that (example)
"Chris Date is a smart person and fulfills the function of database
lecturing.".
Now suppose that Chris Date is tired of being misunderstood,
and decides to retire. Implying that
Chris Date no longer has a function whatsoever. A user might now observe that the proposition contained in his
database (view), is no longer true. He
might want to correct this and (if the update is done through this join view)
he can obviously only do so by deleting the tuple that is responsible for (now
falsely) stating that "Chris Date is a smart person and fulfills the
function of database lecturing.".
Now if I understand your proposed join-delete algorithm correctly, then
the system should react by removing both the assertion regarding Chris Date's
function *as well as* the assertion regarding Chris Date's quality???
Frankly, I can hardly believe this is for
real ... Stop giving lectures does not imply stop being smart.
Stop being smart does not imply stop giving
lectures (as practice sadly shows).
Stop being president does not imply stop making entertaining
mistakes. Stop making entertaining
mistakes does not imply stop being president (maybe even quite the contrary).
More generally : deleting a tuple from a join (or intersect
or difference for that matter) view, means in fact that the user is saying that
"it is not true that both P(A) and P(B) are true." (where P(A) and
P(B) are both propositions related to the relations (relations' tuples) that
constitute the join). Nothing more and
nothing less. To deduce from this fact
that "both P(A) and P(B) are false" (which is indeed what we mean
when we delete "the A part from A and the B part from B"), simply
looks like "false reasoning" to me (well, at least in my school days
it was labeled that way). I therefore
have come to believe that "view updateability" should be allowed to
happen IF AND ONLY IF the system has PRECISELY ONE WAY to "resolve"
the user's request. You state that this
is "highly desirable", but I believe that's not good enough.
I believe it should be a sine qua non.
I further observe that if a user is unaware of whether a
relation is "material" ("base") or "virtual"
("view"), then he will presumably expect to be able to do both
inserts and deletes (of tuples) in that relation. Rightfully so.
But that implies that (under the aforementioned
precisely-one-way-assumption), the system must have (or find) precisely one way
to satisfy an insert request, and precisely one way as well to satisfy a delete
request. Lacking additional
information, achieving this for both insert and delete is impossible. An "insert" of a proposition which
is a boolean "and" allows to deduce that all the individual parts
must be true, but a delete of such a proposition has three possible
"solutions". A similar thing
applies to a "delete" of a boolean "or" proposition. An "and" is involved by the minus,
intersect, join and restrict operators, an "or" is involved in the
union operator.
Yet more generally : whatever the operator, it is inherently
impossible to (unambiguously and in all cases) derive the pair of truth values
(P(A),P(B)) from the sole truth value of the expression "P(A) operator
P(B)" (the proposition of (a tuple in) the view). The pair has four possible values, the
expression two. And the fact alone that
the relations A and B have indeed been *individually* defined, implies that
there is someone somewhere to whom these *individual* propositions are
materially relevant. Well, not
necessarily so, but certainly very likely at least. If these "individual"
propositions are then "randomly" decided by the system, then I think
that system will be unacceptable to that other someone somewhere. "Randomly", because, from that
other user's point of view, I do not believe that any possibility (as to which
pair P(A),P(B) will ultimately be picked by the system to make the database
reflect the desired "P(A) operator P(B)" value) can be expected to be
objectively preferable to any of the others.
In the example I began with : it can not be excluded that there is another
user somewhere who is interested solely in person's qualities, and it can
furthermore not be excluded that this other user is still interested in a
person's qualities, even if that person no longer has a function, or if the
fact that this person indeed has a function has become irrelevant to the "first"
user.
I also briefly elaborated the example of an insert into a
view A XOR B ((A MINUS B) UNION (B MINUS A)).
And hurray, the proposed scheme seems to "correctly" insert my
tuple into either A or B, but not both, which indeed complies with the definition
of XOR. But. If the "left" MINUS is treated first, the tuple ends up
in B, otherwise in A. Now this is
obviously implementation-dependent.
Differences in optimization algorithms might lead to one DBMS taking the
decision "left first", and another one "right first"
Worse, it may even try and do so
simultaneously in two distinct threads.
Such a situation is even justifiable due to the symmetric nature of the
XOR/UNION/OR operator. The DBMSs
themselves may be consistent in the choice they make, and thus expose
"predictable" ("deterministic") behavior, but a user can
still end up with programs who suddenly start behaving differently when the
organization changes its DBMS (say, from DB/3 to Miracle). Which seems to me like yet another
completely undesirable "side-"effect.
Yet another objection that I felt is with the restriction
operator. If a user does an insert (of
a tuple) in a relation, he will afterwards certainly expect that that tuple be
returned if he queries that relation (with selection criteria that match his
specific tuple, of course). Now first
of all, it is explicitly stated that this is not necessarily the case.
If a view is defined on Parts where Color=Red
(and the Color attribute is also projected away from the view), then an insert
into the view will not result in an insert of a Part whose Color attribute
equals Red. Such behavior
"betrays" to the user that his relation is actually not behaving like
a base relation, and therefore must be a view.
But this is in fact undesirable because we do not want that distinction
(between base relations and views) to show in the first place. Furthermore:
'deriving' the applicable
color from the view definition (i.e. inserting a Part whose Color attribute
equals Red), is also a "bad" idea, because such derivation process
cannot be guaranteed to be "uniquely resolvable" in all cases (e.g.
Parts where Color=Red or Color=Blue - so which color should it be ?).
And as for projection : if the color is projected away from
relation 'Parts', then a view is produced whose predicate would be a certain
statement regarding 'parts', followed by '... and it has SOME color' (ref. one
of the introductory chapters). A user
doing an insert in this view, is actually making a proposition which says about
this certain part '... and it has SOME color'.
To which can be added (without change of meaning): '... but that color
to me is UNKNOWN.'. Now if we step over
to the design guidelines regarding the treatment of 'the unknown', then there
we find that if a designer encounters such propositions where some parts (of
the proposition, that is) may be 'unknown', then that should be an indication
for this designer to split this relation in several pieces, right up to the
point where there are no more possible statements of something being
unknown. In other words (my
interpretation) : if a requirement is detected for an insert into a projection
view, then that should be an indication that the projection should actually be
the base relation and not the view, and the "unprojected" relation
should be a (join) view and not the base relation.
I am therefore more inclined to conclude that "general
view updateability" is its own monster of Loch Ness : much sought after,
never found – and indeed undefinable. That is to say : it may be perfectly
possible to find a scheme that satisfies the conditions of "consistent and
predictable behavior", as well as "logically correct behavior (at
least as far as just the view itself is concerned)". But it seems impossible to make such a
scheme also "logically justifiable", by which I mean that whatever
behavior the scheme exposes, it can be proven to each and every individual
other user of that database that this behavior is the *only* logically correct
option.
OTOH, all of this is in fact so plain to see that I cannot
imagine such issues not having been raised before.
And that there must therefore be arguments to the contrary
readily available.
C. J. Date Responds:
First of all, let me say that I no longer regard view updating as a
fully solved problem. A year ago or so
I thought it was--but then Hugh Darwen started to ask some hard questions and I
realized I was wrong. (David McGoveran will probably
disagree with me here.)
That said, I remain optimistic that the problem is
solvable. I also think the discussion
in AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed., is generally along the
right lines, though it gets some of the details wrong.
I currently plan to include an extended
discussion of the issue in the 3rd edition of our book on THE THIRD MANIFESTO,
which is due from Addison-Wesley sometime next spring under the tentative new
title DATABASES, TYPES, AND THE RELATIONAL MODEL.
One important piece of the puzzle that wasn't mentioned in
the 8th edition is what we call The Assignment Principle, which
can be stated thus:
Following assignment of v to V, the comparison v
= V must evaluate to TRUE.
Hugh and I articulated this principle in our recent PRACTICAL DATABASE FOUNDATIONS paper #5
Multiple Assignment.
As we said in that paper, the principle is
so obvious (even trivial) that it might seem hardly worth stating, let alone
dignifying with such a grand name.
Indeed, it's effectively the definition of assignment.
But we also observed in that paper that it's
violated ubiquitously in SQL in particular, and it's tempting to suggest that
such violations occur, at least in part, precisely because the principle hasn't
been properly spelled out before.
Now let me address some of the specific points you raise. The
first has to do with updating joins.
For reasons that should be obvious I don't want to use your specific
example here; let me switch to the well-known suppliers-and-parts example, as
in the 8th edition. Let SSP be the join
of relvars S and SP over S# (a one-to-many join; your example was many-to-many,
and I'll get to that case in a moment).
Consider an attempt to delete just the tuple for supplier S1 and part P1
from SSP. The rules say: Delete the tuple for S1 from relvar S and
delete the tuple for S1 and P1 from relvar SP.
So what happens? There are three
possibilities:
1.
If there are no other shipments for supplier S1, the overall
operation succeeds.
2.
If there are some other shipments for supplier S1 and no other
updates are done (i.e., there are no side effects), the overall operation fails
on a referential integrity violation.
3.
If there are some other shipments for supplier S1 and deleting
supplier S1 from relvar S "cascades" to delete those shipments from
relvar SP, the overall operation fails on a violation of The Assignment
Principle.
In other words, the operation
DELETE SSP WHERE S# = S#('S1') AND P# = P#('P1') ;
is inherently unsafe, since we presumably don't know, in
general, whether there are any other shipments for supplier S1. The following operation, by contrast, is
safe:
DELETE SSP WHERE S# = S#('S1') ;
(it will delete supplier S1 from S and all shipments for
supplier S1 from SP).
Now, you say, correctly (though I paraphrase), that to infer
(NOT PA) AND (NOT PB) from NOT (PA AND PB) is
"false reasoning." I
agree. But I never claimed--at least, I
don't think I ever claimed!--that the rule for deleting from a join was based
purely on logical inference. However, I
do claim that (a) it's a simple rule, (b) it's a reasonable rule, (c) it gives
useful and expected results in the majority of cases, and (d) it gives
predictable results in all cases. I
also claim that the rule is both (e) symmetric in itself and (f) symmetric with
respect to the corresponding insert rule.
Finally, I claim (g) it's desirable and indeed advantageous to have a
rule that doesn't depend on whether the join is one-to-one, one-to-many, or
many-to-many. (Which takes care of the
many-to-many case as promised.)
You also say, correctly (though I paraphrase again), that
there are three possible solutions to the equation NOT (PA AND PB)
= TRUE: viz., PA = TRUE or PB = TRUE or both.
Again, I agree. But having to solve an underspecified set of equations is a
situation we often face, in scientific contexts as well as nonscientific ones.
Linear programming provides a classic
example in the scientific arena; in linear programming, we're given m
linear equations in n unknowns (m < n), and we try to
find a solution--explicitly recognized as not necessarily unique--to those
equations that satisfies some pragmatic condition. (Of course, the pragma
in question needs to be well thought
out.) So I would say in the view
updating context that we have some pragmatic conditions that need to be
satisfied, and those conditions are, or at least include, those listed in the
8th edition (e.g., symmetry).*
* One reason symmetry is
desirable is this: We want the
semantics of A JOIN B to be the same as those of B JOIN A. Likewise for A UNION B, of
course.
Another point of pragma that's relevant is this:
Obviously, we can--and, I presume,
would--use the system's security mechanism to prohibit updates on views that,
if permitted, would have effects we think are undesirable in some way.
In your example, you might be well advised
not to give the user delete privileges on the join of "Person has
Quality" and "Person has Function," if you're worried in some
way about the effect such deletes might have.
And another: Of
course, none of the foregoing imposes any limits on what applications
can do. To revert to suppliers and
parts, there's nothing to stop an application program from (a) displaying the
join of suppliers and shipments as a single table on the screen, (b) allowing
the end user to remove a row from that table somehow, and (c) implementing that
removal by deleting a tuple from the shipments relvar (only) under the
covers. But any suggestion that
"removing a row" in this fashion is a relational DELETE on the join
should be avoided: It is a different
operation (and the user would need to understand that fact, in general), it has
different semantics, and it should be given a different name.
I think I've now responded to many of your points, directly
or indirectly, but there are still some outstanding issues.
You say the 8th edition says it's "highly desirable"
for there to be precisely one way to "resolve" the user's
request. Actually I couldn't find any
remark to this effect. Can you cite
chapter and verse? Because I might want
to delete the remark, if it's there!
Ø Paraphrasing
again, you say that deleting a tuple from a difference means that tuple
satisfies NOT (PA AND PB).
Actually this statement is incorrect.
The predicate for V = A MINUS B is FORALL t
e V (PA(t) AND NOT (t e B)). The 8th edition is wrong on this too
(sigh).
Ø Your
implied remark re deleting through union is incorrect as well (DELETE on A
UNION B must delete from both).
It's INSERT, not DELETE, that might be argued to be ambiguous in the
case of union.
Ø The
symmetric difference example ("A XOR B"): Once again I fear the 8th edition is
incorrect. Without going into details,
let me just say that the insert rules as currently defined lead to a violation
of The Assignment Principle in this case--in part because of the very
lack of symmetry that (as you point out) might otherwise occur.
Ø Restriction: If tuple t is inserted into V,
the result of evaluating the expression V ("SELECT * FROM V"
if you prefer) will include t.
No arguments!--this state of affairs follows from The Assignment
Principle, and it applies to all relvars V, be they restriction
views or whatever. In any case, your
objection here really has to do with projection, not restriction ... See
next.
Ø Projection: I'm sympathetic to your observations in this
connection. Hugh Darwen and I have both
felt for some time that all base relvars should be in 6NF.* Note,
however, that such a discipline doesn't solve the problem!--consider the view V
= (A JOIN B){attributes of A}.
But default values are certainly not the whole of the solution to
that problem, either. As I wrote in my
recent response to Dave Tallman:
"[The] idea that a given attribute has just one default that must
be used in all contexts is ... too rigid.
Further investigation is needed, however, and further discussion is
beyond the scope of these notes."
* Not a typo.
I'd like to close with the following remark.
One of the epigraphs to THE THIRD
MANIFESTO book (all editions) includes the following: "[Logical]
differences are not
everything, and nor is logic." We
do not claim, nor would we want to claim, that everything in the MANIFESTO
is based on logic or logical inference alone.
For example, the prescription to the effect that update operators not
return a value is based, at least in part, on pragmatic considerations; the
same goes for the decision to exclude operators for creating and destroying
databases; likewise for the prescription--or lack of one, rather--regarding
recursively defined types; likewise for the prescriptions concerning tuple and
relation type names; likewise for RM Prescription 26; and so on, probably.
David McGoveran Comments:
Without writing a paper, I do want to make a few comments for
publication in response to the objections to view updateability, at least
insofar as it addresses the approach proposed by Chris Date and myself. A
detailed response is not possible for me at this time. Throughout, I've placed
quotes around certain words or phrases in lower case, because they could use
some further explanation, might be over-interpreted, or otherwise are
imprecise. I apologize in advance for not being able to do better by this
important subject here.
Let me note up front that Chris and I may not see eye-to-eye
regarding view updateability if his published expositions are indicative. I
suspect this divergence is largely a matter of some details of the approach
which Chris more or less downplays, and which then permit many confusions (and
objections) to arise whether by Hugh Darwen, the recent correspondent, or
others.
Unfortunately I have not been in a position to pursue
clarification with Chris, so please do not take this any kind of indictment.
View updateability is of crucial importance to the relational
model. If there is no solution, then there is likewise no possibility of data
independence (both physical and logical!). That would be tantamount to
disclaiming the relational model itself insofar as data independence was the
stated initial motivation of Dr. Codd's relational model research program.
I believe that a solution to the view updateability
"problem" is not only possible, but that it is a solved problem.
I do not believe that the problem has been formulated well in
the literature, nor do I believe that the solution has been well and completely
explained. Furthermore, I believe that the consequences of understanding and
applying the solution are wide-reaching, so much so that we might have to
acknowledge that we never truly understood the relational model heretofore.
As much as I would dearly love to have the resources to write
a complete explanation, I do not. Instead, I will have to be satisfied with
providing a few principles, recommended conceptualizations, or guideposts.
1. The "Date-McGoveran view update approach" (or,
in more concrete form, the algorithm) provides a set of rules for applying
updates to any relation.
2. These rules are not to be applied in isolation. They are,
in a sense, context dependent, where the context is supplied by constraints
accessible to the DBMS and by the current user's intent as expressible through
(a certain strictly constructive and strictly finite) first order predicate
logic. The simple expression of the rules only tell us what to do in the
absence of further contextual information.
3. A DBMS that supports the relational model cannot be
"magic." It cannot compensate for ambiguity, intuit withheld
information or assumptions, or correct expressions that violate the designer's
or user's intent. If the designer fails to capture information in database
design, if information is hidden from the user, or if the user incompetently
fails to express their intent [Ed. Note:
Or if the data language is not sufficiently expressive], it will certainly
produce seemingly anomalous or "surprising" results when faced with
these problems.
4. No logical data independence mechanism can be permitted to
hide relation semantics from users (the contrary would be absurd), although it
seems that many writers have assumed that to be part of its intent. The
semantics of a derived view is inherently distinct from that of a "base
relation." For example, the relvar predicate for a join view necessarily
involves multiple irreducible predicates, whereas that for a "base
relation" should not.
5. Logical data independence can hide "base" data
and "base" data representation. For example, a projection view can
hide the specific data found in the "lost" attributes even though the
relvar predicate clearly tells us that those attributes, and data in them, must
exist. Additionally, the view might not then contain any evidence of
dependencies among those attributes and attributes in other relations.
6. For logical data independence to be possible, the
semantics of relational operators must be consistent, whether the operands are
derived relations or "base" relations. Although we have traditionally
thought of this as requiring that we find some algorithm for view updating,
this is a poor method of attack, which seems obvious when we consider that
derived relations necessarily have the more complex semantics. Thus, we must
solve the general case of update semantics for which updating "base"
relations is the special case (and not the reverse).
7. All relational updates should be understood as set
transformations: Therefore, they are atomic, and the final "state" is
not dependent on the order of any intermediate steps. Just as is the case when
evaluating a logical expression (e.g., a tautology or a relvar predicate),
there are no implementation dependencies (as, for example, in treating the left
vs. right "MINUS" first) -- only implementation errors. An optimizer
incorporating such implementation dependencies is faulty.
8. We must avoid the temptation to think of a relational
update as anything other than a declarative specification of a resulting
relation (a set) incorporating the relvar predicate of source relation and its
current value. Nothing else is pertinent to computing the value that is to be
"assigned" to the relvar. If a reference to one or more tuples occurs
within the update expression, we should not think even of these as anything
other than a particular way of expressing a constraint (i.e., as part of an
"update predicate").
9. Correct database design is crucial for understanding view
updateability (Chris Date will possibly demur on this point), and goes well
beyond normalization or even our Principle of (Semantically) Orthogonal Design.
True, the results will always be predictable and deterministic, but are likely
to produce confusing side effects from time-to-time.
With these principles in mind, reconsider the
"problem" with the example of delete through a join view. The
"betrayal" of the user is an illusion and ceases to be a problem. In
fact, it is a problem which we accept all the time for "base"
relations: We can easily construct a database of "base" relations for
which there are "side effects" (remember cascade delete?) or which
prevent the user from obtaining update results that behave as if the target
relation exists in isolation.
The "problem" of finding a unique inverse
transformation for a join view in a given context without specifying any context
goes away. Clearly, the delete must satisfy 'NOT PA OR NOT PB'. (Aside: We must
be very careful what we mean by NOT, avoiding confusion with simple complement.
Relations have a relative complement - tuples not asserted to be 'true' - with
a scope that is distinct from that of the relvar predicate - tuples that cannot
belong to the relation.) This requirement may be further constrained in at
least one of two ways. First, if they know, the user should specify their
intent as to why the conjunction is 'false'. Second, some of the integrity
rules and dependencies the designer imposes may further constrain the general
requirement.
Of necessity, I leave the detailed analysis of other view
updateability "problems" to the reader as an exercise in applying the
principles I've stated above. My hope is that they will encourage the reader to
reframe both the problem and the solution.
Posted
09/14/04