(Continued from Part 1)
3. Table Predicates
Now we can get back to our discussion of what tables mean. As we stated
previously, every table--be it
a base table, a view, a query result, or whatever--certainly does have an
associated meaning. And, of course,
users must be aware of those meanings if they are to use the database
effectively (and correctly). For
example, the meaning of table EMP is something like the following:
"The employee with the specified employee number (EMP#)
has the specified name (ENAME), works in the specified department (DEPT#), and
earns the specified salary (SAL). Furthermore, if the department number is D1,
then the salary is less than 44K. Also,
no two employees have the same employee number."
(This statement is not very precise, but it will serve for
the moment.)
Formally, this statement is an example of what is called a predicate,
or truth-valued function--a function of four arguments, in this particular
case. Substituting values for the
arguments is equivalent to invoking the function (or
"instantiating" the predicate), thereby yielding an expression that evaluates
to either true or false.
For example, the substitution
EMP# = 'E1' ENAME = 'Lopez' DEPT# = 'D1' SAL = 25K
yields the value true. By contrast, the substitution
EMP# = 'E1' ENAME = 'Abbey' DEPT# = 'D3' SAL = 45K
yields the value false. And at any given time, of course, the table
contains exactly
those rows that make the predicate evaluate to true at that time.
It follows from the foregoing that if (for example) a row is
presented as a candidate for insertion into some table, the DBMS should accept
that row only if it does not cause the corresponding predicate to be
violated. More generally, the
predicate for a given table represents the criterion for update
acceptability for that table--that is, it constitutes the criterion for
deciding whether or not some proposed update is in fact valid (or at least
plausible) for the given table.
In order for it to be able to decide whether or not a
proposed update is acceptable for a given table, therefore, the DBMS needs to
be aware of the predicate for that table.
Now, it is of course not possible for the DBMS to know exactly
what the predicate is for a given table.
In the case of table EMP, for example, the DBMS has no way of knowing a
priori that the predicate is such that the row <E1,Lopez,D1,25K>
makes it true and the row <E1,Abbey,D3,45K> does not; it also has
no way of knowing exactly what certain terms appearing in that predicate (such
as "works in" or "earns") really mean. However, the DBMS certainly does
know
a reasonably close approximation to that predicate. To be specific, it knows that, if a given row
is to be deemed
acceptable, all of the following must be true:
·
The EMP# value must be a value from the domain of
employee numbers
·
The ENAME value must be a value from the domain of names
·
The DEPT# value must be a value from the domain of
department numbers
·
The SAL value must be a value from the domain of US
currency
·
If the DEPT# value is D1 then the SAL value must be
less than 44K
·
The EMP# value is unique with respect to all such
values in the table
In other words, for a base table such as EMP, the DBMS does
at least know all the integrity constraints (column constraints and
table constraints) that have been declared for that base table. Formally, therefore, we can
define
the (DBMS-understood) "meaning" of a given base table to be the
logical AND of all column constraints and table constraints that apply to that
base table (and it is this meaning that the DBMS will check whenever an
update is attempted on the base table in question). For example, the formal meaning of base table
EMP is the
following:
e.EMP# IN EMP#_DOM AND
e.ENAME IN NAME_DOM AND
e.DEPT# IN DEPT#_DOM AND
e.SAL IN US_CURRENCY_DOM AND
(IF e.DEPT# = 'D1'
THEN e.SAL < 44K) AND
(IF e.EMP# = f.EMP#
THEN e.ENAME = f.ENAME AND
e.DEPT# = f.DEPT# AND
e.SAL = f.SAL)
We will refer to this expression--let us call it PE--as the
table predicate for base table EMP.
So much for base tables.
But what about derived tables--in particular, what about
views? What is the table predicate for
a derived table? Clearly, what we need
is a set of rules such that if the DBMS knows the table predicate(s) for the
input(s) to any relational operation, it can deduce the table predicate for the
output from that operation. Given such
a set of rules, the DBMS will then know the table predicate for all possible
tables, and will thus be able to decide the acceptability or otherwise of an
arbitrary update on an arbitrary table (derived or base).
It is in fact very easy to state such a set of rules--they
follow immediately from the definitions of the relational operators. For example, if
A and B are
any two type-compatible tables (type-compatibility is usually referred to as
union-compatibility
in the literature. We prefer our term
for reasons that are beyond the scope of the present discussion) and their
respective table predicates are PA and PB, then the table
predicate PC for table C, where C is defined as A
INTERSECT B, is obviously (PA) AND (PB); that is, a row r
will appear in C if and only if it appears in both A and B--i.e.,
if and only if PA(r) and PB(r) are both true. So if, for
example, we define C as a
view and try to insert r into that view, r must satisfy both the
table predicate for A and the table predicate for B, or the
INSERT will fail (see the section Updating Intersections and Differences
later for further discussion).
Here is another example:
The table predicate for the table that results from the restriction
operation
T WHERE
condition
is (PT) AND (condition), where PT is the
table predicate for T. For
example, the table predicate for EMP WHERE DEPT# = 'D1' is
(PE) AND (DEPT# = 'D1')
where PE is the table predicate for EMP as defined earlier.
Stating the table predicates corresponding to the other
relational operators is left as an exercise for the reader.
4. Further Principles
There are several further principles that must be satisfied
by any systematic view updating mechanism.
Space does not permit much elaboration on these principles here, but
most of them are readily understandable on intuitive grounds anyway.
- All tables must be genuine relations--i.e. duplicate rows are
not permitted.
- The updatability or otherwise of a given view is a semantic
issue, not a syntactic one--i.e. it must not depend on the particular form
in which the view definition happens to be stated. For example, the following two view
definitions are semantically identical:
CREATE VIEW V AS
EMP WHERE DEPT# = 'D1' OR
SAL > 33K;
CREATE VIEW V AS
(EMP WHERE DEPT# = 'D1') UNION (EMP WHERE SAL > 33K);
Obviously, both of these views should be updatable. The SQL standard, however,
and most of
today's SQL products, adopt the ad hoc position that the first is
updatable and the second is not.
- It follows from the previous point that the view updatability rules
must work correctly in the special case when the "view" is in
fact a base table. This is because
any base table B is semantically indistinguishable from a view V
that is defined as B UNION B, or B INTERSECT B,
or B MINUS C (if C is another base table that has no
rows in common with B), or B WHERE true, or any of
several other expressions that are identically equivalent to just B. Thus, for example,
the rules for
updating a union view, when applied to the view V = B UNION B,
must yield exactly the same result as if the updates had been applied
directly to the base table B.
- The rules must preserve symmetry where applicable. For example, the delete rule for an
intersection view V = A INTERSECT B must not
arbitrarily cause a row to be deleted from A and not from B,
even though such a one-sided delete would certainly have the effect of
deleting the row from the view.
Instead, the row must be deleted from both A and B.
- The rules must take into account any applicable triggered actions,
such as cascade DELETE (for numerous well-documented reasons we would
prefer such triggered actions to be specified declaratively, not
procedurally. However, the view
updating rules per se do not impose any such requirement.)
- For reasons of simplicity among others, it is desirable to regard
UPDATE as shorthand for a DELETE-then-INSERT sequence (i.e., just as
syntactic sugar), and we will so regard it later in this paper. This shorthand is acceptable
provided
it is understood that:
§
No checking of table predicates is done "in the
middle of" any given update; that is, the expansion of UPDATE is
DELETE-INSERT-check, not DELETE-check-INSERT-check. The reason is, of course, that the DELETE
portion might temporarily
violate the table predicate while the UPDATE overall does not. For instance, suppose table T
contains exactly 10 rows, and consider the effect of "UPDATE row r"
on T if T's table predicate says that T must contain at
least 10 rows.
§
Triggered actions are likewise never performed "in
the middle of" any given update (in fact they are done at the end,
immediately prior to the table predicate checking).
§
The shorthand requires some slight refinement (beyond
the scope of the present paper) in the case of projection views.
§ We remark that treating UPDATEs as
DELETEs-then-INSERTs implies that we regard UPDATEs as replacing entire rows,
not as replacing individual values within such a row.
- All update operations on views are implemented by the same kind of
update operations on the underlying tables. That is, INSERTs map to INSERTs and DELETEs to
DELETEs (we
can ignore UPDATEs, thanks to the previous point). For suppose, contrariwise, that there
is some kind of view--say a union view--for which (say) INSERTs map to
DELETEs. Then it must follow that
INSERTs on a base table must also sometimes map to DELETEs! This is because (as already
observed
under 3 above) the base table B is semantically identical to the
union view V = B UNION B. An analogous argument applies to every other
kind of view
also (restriction, projection, intersection, etc.). The idea that an INSERT on a base table
might really be a DELETE we take to be self-evidently absurd; hence our
position that (to repeat) INSERTs map to INSERTs and DELETEs to DELETEs.
- In general, the rules when applied to a given view V will
specify the operations to be applied to the table(s) on which V is
defined. And those rules must work
correctly even when those underlying tables are themselves derived tables
in turn. In other words, the rules
must be capable of recursive application.
- The rules cannot assume that the database is well designed (e.g.,
fully normalized). However, they
might on occasion produce a slightly surprising result if the database is not
well designed--a fact that can be seen in itself as an additional argument
in support of good design. We will
give some examples of such "slightly surprising results" later
in this paper.
(The authors would like to thank Nagraj Alur, Hugh Darwen,
Fabian Pascal, and Paul Winsberg for their helpful comments on earlier drafts
of this paper.)
Comments On Republication: Originally published in Database
Programming & Design 7, No. 6 (June 1994) and published as a two-part
article in RELATIONAL
DATABASE WRITINGS1991-94. It is republished here by permission of David
McGoveran, Miller Freeman Inc. and Pearson
Education, Inc. © All rights reserved by C.J. Date. Research has shown
that certain detail level corrections might be needed, which we may undertake
in the future. However, we still believe strongly that the overall approach is
sound.
(Continued in Part 3)
Posted
11/29/02
[ABOUT]
[QUOTES]
[LINKS]