UPDATING VIEWS PART 1: UNION, INTERSECTION, AND DIFFERENCE
by Chris Date and David McGoveran

 

 

 

I am not alone in thinking that the treatment of view updating within the overall theory of relational databases has always been rather unsatisfactory for one reason or another. As Nat Goodman puts it:  "... there is no theory behind any of this.  Each [rule] seems intuitively correct, but there is no overall framework.  It would be better to have a general rule that states what a correct view update algorithm has to do, and then derive the special rules for each case from that general rule.  Without this, the [rules] feel like a crazy patchwork of exceptions and special notes.  In some cases, two or more [rules] are equally sensible ..." (and so on).

 

In the paper that follows, we (i.e., David McGoveran and myself) present such a general rule.  We then derive in this and subsequent sections the special rules for unions, intersections, differences, joins and the other relational operators from that general rule.

 

It is perhaps of interest to say that these proposals grew out of my attempts to come up with a view updating scheme that would treat the semantically equivalent views V1 = A INTERSECT B and V2 = A MINUS (A MINUS B) equivalently--in particular, taking into account the fact that the definition of V1 is symmetric in A and B while the definition of V2 is not.  The key observation (obvious, after the fact!) turned out to be that at all times, every table T must satisfy the table predicate that applies to T.  It follows (to spell it out in detail) that when an update is performed on any table T:

 

Before the update, table T cannot include any row(s) that cause the table predicate for T to be violated.

 

After the update, table T cannot include any row(s) that cause the table predicate for T to be violated.

 

Thus, for example, if a row r is presented for insertion into V = A MINUS B, then row r:

 

Ø       Must satisfy the predicate for A (for otherwise it cannot appear in A and hence cannot appear in A MINUS B a fortiori);

Ø       Must not satisfy the predicate for B (for otherwise it might already appear in B, in which case inserting it into A will cause it not to appear in A MINUS B);

Ø       Must not already appear in A (for otherwise we will be attempting to insert a duplicate row);

Ø       Cannot already appear in B (because it does not satisfy the predicate for B).

 

Note:  The foregoing statements are slightly simplified; as the body of the paper explains, it's not really rows that have to satisfy predicates, it's tables.  Also, regarding the second of the bullet items above, the alert reader might suggest that if r does already appear in B, the desired effect (of inserting r into A MINUS B) might be achieved by deleting it from B before (if necessary) inserting it into A.  We reject this possibility for reasons explained in the body of the paper (see in the section "Further Principles," especially Principle No. 7).

 

Portions of this material previously appeared in somewhat different form elsewhere. Also--to repeat from the "Comments on Republication"--I am grateful to David McGoveran for allowing me to republish the material in the present book. 

 

 

Introduction

 

Historically, the question of view updatability has typically been treated in a fairly ad hoc manner.  Certainly this is the case:

 

Ø       In the SQL standard

Ø       In today's commercial SQL products

Ø       Even--to some extent--in the more research-oriented technical literature

 

It is not at all unusual, for example, to find that a given DBMS will

 

Ø       Prohibit updates on a view that is logically updatable, or

Ø       Permit updates on a view that is logically not updatable, or

Ø       Implement view updates in a logically incorrect way,

 

or--most likely in practice!--do all of the above, depending on circumstances.

 

The lack of a systematic approach to the problem is further underscored by the undue emphasis that has historically been laid on restriction, projection, and join views.  Union, intersection, and difference views, which are--or should be--at least as important in practice, have received comparatively little attention.

 

The authors of this article have recently developed a systematic and formal approach to the view updating problem.  In this first part we present an informal introduction (emphasis on "informal"!) to that formal approach.  We also illustrate the approach as it applies to union, intersection, and difference views specifically. The second part will deal with joins and other kinds of views.

 

 

1. Integrity Constraints

 

Before we can get into the specifics of our approach, we need to lay some groundwork.  The first point--and this one is absolutely crucial--is that every table has an interpretation or meaning.  In order to explain this point, we must first digress for a moment to consider the general issue of integrity constraints.  For the purposes of this discussion, it is convenient to classify such constraints into four kinds, namely domain constraints, column constraints, table constraints, and database constraints, as follows (this classification is slightly different (but not dramatically so) from that previously given by Date.

 

A domain constraint is just the definition of the set of values that go to make up the domain in question; in other words, it effectively just enumerates the values in the domain (We mention domain constraints merely for completeness--they play no part in the view updating scheme that is the primary focus of this paper).

 

A column constraint states that the values appearing in a specific column must be drawn from some specific domain.  For example, consider the employees base table (see Fig. 1 for a sample tabulation). 

 

EMP {EMP#,ENAME,DEPT#,SAL}

 

The columns of that table are subject to the following column constraints:

 

+----------------------------+

¦ EMP# ¦ ENAME ¦ DEPT# ¦ SAL ¦

+------+-------+-------+-----¦

¦ E1   ¦ Lopez ¦ D1    ¦ 25K ¦

¦ E2   ¦ Cheng ¦ D1    ¦ 42K ¦

¦ E3   ¦ Finzi ¦ D2    ¦ 30K ¦

¦ E4   ¦ Saito ¦ D2    ¦ 45K ¦

+----------------------------+

Fig. 1: Base table EMP (sample values)

 

 

e.EMP#  IN EMP#_DOM

e.ENAME IN NAME_DOM

e.DEPT# IN DEPT#_DOM

e.SAL IN US_CURRENCY_DOM

 

Here e represents an arbitrary row of the table and EMP#_DOM, NAME_DOM, etc., are the names of the relevant domains (throughout this paper we use a modified form of conventional SQL syntax, for reasons of simplicity and explicitness.)

 

A table constraint states that a specific table must satisfy some specific condition, where the condition in question refers solely to the table under consideration--i.e., it does not refer to any other table, nor to any domain.  For example, here are two table constraints for the base table EMP:

 

   1. IF e.DEPT# = 'D1' THEN e.SAL < 44K

 

   2. IF e.EMP# = f.EMP# THEN e.ENAME = f.ENAME

                        AND e.DEPT# = f.DEPT#

                        AND e.SAL = f.SAL

 

The first of these says that employees in department D1 must have a salary less than 44K.  The second says that if two rows e and f have the same EMP# value, then they must also have the same ENAME value, the same DEPT# value, and the same SAL value--in other words, they must be the same row (this is just a longwinded way of saying that EMP# is a candidate key).

 

Note:  Observe that we talk of table constraints in general, not just base table constraints.  The point is, all tables, base or otherwise, are subject to table constraints, as we will see later.

 

For the purposes of this paper we define a database to be some user-defined--or DBA-defined--collection of named tables (base tables and/or views).  A database constraint, then, states that the database in question must satisfy some specific condition, where the condition in question can refer to as many named tables as desired.  For example, suppose the database containing base table EMP were extended to include a departments base table DEPT.  Then the referential constraint from EMP to DEPT would be a database constraint (referring, as it happens, to exactly two base tables).

 

Here is another example of a database constraint (also referring to the same two base tables):

 

d.BUDGET > 2 * SUM (e WHERE e.DEPT# = d.DEPT#,SAL)

 

Here d and e represent an arbitrary DEPT row and an arbitrary EMP row, respectively.  The constraint says that every department has a budget that is at least twice the sum of all salaries for employees in that department.

 

Note:  Unlike column constraints and base table constraints, which can always be checked immediately (i.e., after each individual update operation), database constraints must--at least conceptually--be deferred (i.e., checked at end-of-transaction).  In practice there will be many cases where database constraints too can be checked immediately, but such "early" checking should be regarded as nothing more than an optimization. [Ed. Note: Chris Date has changed his mind and now believes that all checking should be immediate. See an allusion to this in the series on Constraints and Predicates Parts 1, 2, and 3.

 

(The authors would like to thank Nagraj Alur, Hugh Darwen, Fabian Pascal, and Paul Winsberg for their helpful comments on earlier drafts of this article.)

 

Comments On Republication: Originally published in Database Programming & Design 7, No. 6 (June 1994) and published as a two-part article in RELATIONAL DATABASE WRITINGS1991-94. It is republished here by permission of David McGoveran, Miller Freeman Inc. and Pearson Education, Inc. © All rights reserved by C.J. Date. Research has shown that certain detail level corrections might be needed, which we may undertake in the future. However, we still believe strongly that the overall approach is sound.

 

(Continued in Part 2)

 

 

Posted 11/15/02

 

 

 

[ABOUT] [QUOTES] [LINKS]