UPDATING VIEWS PART 2: UNION, INTERSECTION, AND DIFFERENCE
by Chris Date and David McGoveran

 

 

 

(Continued from Part 1)

 

 

3. Table Predicates

 

Now we can get back to our discussion of what tables mean.  As we stated previously, every table--be it a base table, a view, a query result, or whatever--certainly does have an associated meaning.  And, of course, users must be aware of those meanings if they are to use the database effectively (and correctly).  For example, the meaning of table EMP is something like the following:

 

"The employee with the specified employee number (EMP#) has the specified name (ENAME), works in the specified department (DEPT#), and earns the specified salary (SAL). Furthermore, if the department number is D1, then the salary is less than 44K.  Also, no two employees have the same employee number."

 

(This statement is not very precise, but it will serve for the moment.)

 

Formally, this statement is an example of what is called a predicate, or truth-valued function--a function of four arguments, in this particular case.  Substituting values for the arguments is equivalent to invoking the function (or "instantiating" the predicate), thereby yielding an expression that evaluates to either true or false.  For example, the substitution

 

EMP# = 'E1' ENAME = 'Lopez' DEPT# = 'D1' SAL = 25K

 

yields the value true.  By contrast, the substitution

 

EMP# = 'E1' ENAME = 'Abbey' DEPT# = 'D3' SAL = 45K

 

yields the value false.  And at any given time, of course, the table contains exactly those rows that make the predicate evaluate to true at that time. 

 

It follows from the foregoing that if (for example) a row is presented as a candidate for insertion into some table, the DBMS should accept that row only if it does not cause the corresponding predicate to be violated.  More generally, the predicate for a given table represents the criterion for update acceptability for that table--that is, it constitutes the criterion for deciding whether or not some proposed update is in fact valid (or at least plausible) for the given table. 

 

In order for it to be able to decide whether or not a proposed update is acceptable for a given table, therefore, the DBMS needs to be aware of the predicate for that table.  Now, it is of course not possible for the DBMS to know exactly what the predicate is for a given table.  In the case of table EMP, for example, the DBMS has no way of knowing a priori that the predicate is such that the row <E1,Lopez,D1,25K> makes it true and the row <E1,Abbey,D3,45K> does not; it also has no way of knowing exactly what certain terms appearing in that predicate (such as "works in" or "earns") really mean.  However, the DBMS certainly does know a reasonably close approximation to that predicate.  To be specific, it knows that, if a given row is to be deemed acceptable, all of the following must be true:

 

·   The EMP# value must be a value from the domain of employee numbers

·   The ENAME value must be a value from the domain of names

·   The DEPT# value must be a value from the domain of department numbers

·   The SAL value must be a value from the domain of US currency

·   If the DEPT# value is D1 then the SAL value must be less than 44K

·   The EMP# value is unique with respect to all such values in the table

 

In other words, for a base table such as EMP, the DBMS does at least know all the integrity constraints (column constraints and table constraints) that have been declared for that base table.  Formally, therefore, we can define the (DBMS-understood) "meaning" of a given base table to be the logical AND of all column constraints and table constraints that apply to that base table (and it is this meaning that the DBMS will check whenever an update is attempted on the base table in question).  For example, the formal meaning of base table EMP is the following:

 

e.EMP# IN EMP#_DOM AND

e.ENAME IN NAME_DOM AND

e.DEPT# IN DEPT#_DOM AND

e.SAL IN US_CURRENCY_DOM AND

 (IF e.DEPT# = 'D1' THEN e.SAL < 44K) AND

 (IF e.EMP# = f.EMP# THEN e.ENAME = f.ENAME AND

                           e.DEPT# = f.DEPT# AND

                           e.SAL   = f.SAL) 

 

We will refer to this expression--let us call it PE--as the table predicate for base table EMP. 

 

So much for base tables.  But what about derived tables--in particular, what about views?  What is the table predicate for a derived table?  Clearly, what we need is a set of rules such that if the DBMS knows the table predicate(s) for the input(s) to any relational operation, it can deduce the table predicate for the output from that operation.  Given such a set of rules, the DBMS will then know the table predicate for all possible tables, and will thus be able to decide the acceptability or otherwise of an arbitrary update on an arbitrary table (derived or base).

 

It is in fact very easy to state such a set of rules--they follow immediately from the definitions of the relational operators.  For example, if A and B are any two type-compatible tables (type-compatibility is usually referred to as union-compatibility in the literature.  We prefer our term for reasons that are beyond the scope of the present discussion) and their respective table predicates are PA and PB, then the table predicate PC for table C, where C is defined as A INTERSECT B, is obviously (PA) AND (PB); that is, a row r will appear in C if and only if it appears in both A and B--i.e., if and only if PA(r) and PB(r) are both true.  So if, for example, we define C as a view and try to insert r into that view, r must satisfy both the table predicate for A and the table predicate for B, or the INSERT will fail (see the section Updating Intersections and Differences later for further discussion). 

 

Here is another example:  The table predicate for the table that results from the restriction operation

 

T WHERE condition

 

is (PT) AND (condition), where PT is the table predicate for T.  For example, the table predicate for EMP WHERE DEPT# = 'D1' is

 

(PE) AND (DEPT# = 'D1')

 

where PE is the table predicate for EMP as defined earlier.

 

Stating the table predicates corresponding to the other relational operators is left as an exercise for the reader.

 

 

4. Further Principles

 

There are several further principles that must be satisfied by any systematic view updating mechanism.  Space does not permit much elaboration on these principles here, but most of them are readily understandable on intuitive grounds anyway.

 

  1. All tables must be genuine relations--i.e. duplicate rows are not permitted.

 

  1. The updatability or otherwise of a given view is a semantic issue, not a syntactic one--i.e. it must not depend on the particular form in which the view definition happens to be stated.  For example, the following two view definitions are semantically identical:

 

CREATE VIEW V AS

EMP WHERE DEPT# = 'D1' OR SAL > 33K;

 

CREATE VIEW V AS

(EMP WHERE DEPT# = 'D1') UNION (EMP WHERE SAL > 33K);

 

Obviously, both of these views should be updatable.  The SQL standard, however, and most of today's SQL products, adopt the ad hoc position that the first is updatable and the second is not. 

 

  1. It follows from the previous point that the view updatability rules must work correctly in the special case when the "view" is in fact a base table.  This is because any base table B is semantically indistinguishable from a view V that is defined as B UNION B, or B INTERSECT B, or B MINUS C (if C is another base table that has no rows in common with B), or B WHERE true, or any of several other expressions that are identically equivalent to just B.  Thus, for example, the rules for updating a union view, when applied to the view V = B UNION B, must yield exactly the same result as if the updates had been applied directly to the base table B.

 

  1. The rules must preserve symmetry where applicable.  For example, the delete rule for an intersection view V = A INTERSECT B must not arbitrarily cause a row to be deleted from A and not from B, even though such a one-sided delete would certainly have the effect of deleting the row from the view.  Instead, the row must be deleted from both A and B.

 

  1. The rules must take into account any applicable triggered actions, such as cascade DELETE (for numerous well-documented reasons we would prefer such triggered actions to be specified declaratively, not procedurally.  However, the view updating rules per se do not impose any such requirement.)

 

  1. For reasons of simplicity among others, it is desirable to regard UPDATE as shorthand for a DELETE-then-INSERT sequence (i.e., just as syntactic sugar), and we will so regard it later in this paper.  This shorthand is acceptable provided it is understood that:

 

§   No checking of table predicates is done "in the middle of" any given update; that is, the expansion of UPDATE is DELETE-INSERT-check, not DELETE-check-INSERT-check.  The reason is, of course, that the DELETE portion might temporarily violate the table predicate while the UPDATE overall does not.  For instance, suppose table T contains exactly 10 rows, and consider the effect of "UPDATE row r" on T if T's table predicate says that T must contain at least 10 rows.

 

§   Triggered actions are likewise never performed "in the middle of" any given update (in fact they are done at the end, immediately prior to the table predicate checking).

 

§   The shorthand requires some slight refinement (beyond the scope of the present paper) in the case of projection views. 

 

§   We remark that treating UPDATEs as DELETEs-then-INSERTs implies that we regard UPDATEs as replacing entire rows, not as replacing individual values within such a row.

 

  1. All update operations on views are implemented by the same kind of update operations on the underlying tables.  That is, INSERTs map to INSERTs and DELETEs to DELETEs (we can ignore UPDATEs, thanks to the previous point).  For suppose, contrariwise, that there is some kind of view--say a union view--for which (say) INSERTs map to DELETEs.  Then it must follow that INSERTs on a base table must also sometimes map to DELETEs!  This is because (as already observed under 3 above) the base table B is semantically identical to the union view V = B UNION B.  An analogous argument applies to every other kind of view also (restriction, projection, intersection, etc.).  The idea that an INSERT on a base table might really be a DELETE we take to be self-evidently absurd; hence our position that (to repeat) INSERTs map to INSERTs and DELETEs to DELETEs.

 

  1. In general, the rules when applied to a given view V will specify the operations to be applied to the table(s) on which V is defined.  And those rules must work correctly even when those underlying tables are themselves derived tables in turn.  In other words, the rules must be capable of recursive application.

 

  1. The rules cannot assume that the database is well designed (e.g., fully normalized).  However, they might on occasion produce a slightly surprising result if the database is not well designed--a fact that can be seen in itself as an additional argument in support of good design.  We will give some examples of such "slightly surprising results" later in this paper.

 

 

(The authors would like to thank Nagraj Alur, Hugh Darwen, Fabian Pascal, and Paul Winsberg for their helpful comments on earlier drafts of this paper.)

 

Comments On Republication: Originally published in Database Programming & Design 7, No. 6 (June 1994) and published as a two-part article in RELATIONAL DATABASE WRITINGS1991-94. It is republished here by permission of David McGoveran, Miller Freeman Inc. and Pearson Education, Inc. © All rights reserved by C.J. Date. Research has shown that certain detail level corrections might be needed, which we may undertake in the future. However, we still believe strongly that the overall approach is sound.

 

(Continued in Part 3)

 

 

Posted 11/29/02

 

 

 

[ABOUT] [QUOTES] [LINKS]