On View Updating (C. J. Date and D. McGoveran)

Tuesday, December 20, 2016

On View Updating (C. J. Date and D. McGoveran)

Follow @DBDebunk Follow @ThePostWest

My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.

"I just (almost) finished reading AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed. and I have a bit of a problem with view updatability. I'll try and explain with an example and I would be pleased if I could get some clarifications.

Let R1 be a base relation (NAME,QUALITY), with the semantics that every occurring tuple indicates that the person with the name in question has the quality in question. That is, the entity rule in the corresponding relation predicate (RP) is something like "Person identified by name (NAME) has quality (QUALITY)."

Let R2 be a similar base relation (NAME, FUNCTION), with the corresponding rule "Person identified by name (NAME) has function (FUNCTION)."

Tuples in R1 could indicate that "Chris Date has smartness.", or "George Bush has linguistic difficulties." Tuples in R2 could indicate that "Chris Date has the function database lecturer", or "George Bush has the function of President of the United States.".

These two relations can be joined, yielding a view whose rule would be something along the lines of "Person identified by name (NAME) has quality (QUALITY) and has function (FUNCTION)." One tuple's proposition in this view might be that "Chris Date has smartness and the function of database lecturer."

Now suppose that Chris Date is tired of being misunderstood, and decides to retire. Implying that Chris Date no longer has a function whatsoever. A user might now observe that the proposition contained in his database (view), is no longer true. He might want to correct this and (if the update is done through this join view) he can obviously only do so by deleting the join tuple that is now false. Now, if I understand your proposed join-delete algorithm correctly, then the system should react by removing both the assertion regarding Chris Date's function *as well as* the assertion regarding Chris Date's quality? Frankly, I can hardly believe this is for real ... Stop giving lectures does not imply stop being smart and vice-versa.

More generally: deleting a tuple from a join (or intersect or difference for that matter) view, means in fact that the user is saying that "it is not true that both P(A) and P(B) are true."--where P(A) and P(B) are both propositions related to the tuples of the base relations that are joined in the view. Nothing more and nothing less. To deduce from this fact that "both P(A) and P(B) are false" (which is indeed what we mean when we delete "the A part from A and the B part from B"), simply looks like "false reasoning" to me (well, at least in my school days it was labeled that way). I therefore have come to believe that "view updatability" should be allowed to happen IF AND ONLY IF the system has PRECISELY ONE WAY to "resolve" the user's request. You state that this is "highly desirable", but I believe that's not good enough. I believe it should be a sine qua non."

C. J. Date

First of all, let me say that I no longer regard view updating as a fully solved problem. A year ago or so I thought it was--but then Hugh Darwen started to ask some hard questions and I realized I was wrong. (David McGoveran will probably disagree with me here.) That said, I remain optimistic that the problem is solvable.

That said, I remain optimistic that the problem is solvable. The discussion in my 8th Ed. is generally along the right lines, though it gets some of the details wrong. One important piece of the puzzle that wasn't mentioned in the 8th edition is what I and Darwen introduced as The Assignment Principle (Ed. Note: For CJD's current position see [8]).

Now let me address some of the specific points you raise. The first has to do with updating joins. For reasons that should be obvious I don't want to use your specific example here; let me switch to the well-known suppliers-and-parts example. Let SSP be the join of relvars S and SP over S# (a one-to-many join; your example was many-to-many, and I'll get to that case in a moment). Consider an attempt to delete just the tuple for supplier S1 and part P1 from SSP. The rules say: Delete the tuple for S1 from relvar S and delete the tuple for S1 and P1 from relvar SP. So what happens? There are three possibilities:

If there are no other shipments for supplier S1, the overall operation succeeds.

If there are some other shipments for supplier S1 and no other updates are done (i.e., there are no side effects), the overall operation fails on a referential integrity violation.

If there are some other shipments for supplier S1 and deleting supplier S1 from relvar S "cascades" to delete those shipments from relvar SP, the overall operation fails on a violation of The Assignment Principle.

In other words, the operation

DELETE ssp WHERE s# = s#('s1') AND p# = p#('p1');

is inherently unsafe, since we presumably don't know, in general, whether there are any other shipments for supplier S1. The following operation, by contrast, is safe:

DELETE ssp WHERE s# = s#('s1');

(it will delete supplier S1 from S and all shipments for supplier S1 from SP).

Now, you say, correctly (though I paraphrase), that to infer (NOT PA) AND (NOT PB) from NOT (PA AND PB) is "false reasoning." I agree. But I never claimed--at least, I don't think I ever claimed!--that the rule for deleting from a join was based purely on logical inference. However, I do claim that (a) it's a simple rule, (b) it's a reasonable rule, (c) it gives useful and expected results in the majority of cases, and (d) it gives predictable results in all cases. I also claim that the rule is both (e) symmetric in itself and (f) symmetric with respect to the corresponding insert rule. Finally, I claim (g) it's desirable and indeed advantageous to have a rule that doesn't depend on whether the join is one-to-one, one-to-many, or many-to-many. (Which takes care of the many-to-many case as promised.)

David McGoveran

Without writing a paper, I do want to make a few comments in response to the objections to view updatability, at least insofar as it addresses the approach proposed by Chris Date and myself.

Let me note up front that Chris and I may not see eye-to-eye regarding view updatability if his published expositions are indicative. I suspect this divergence is largely a matter of some details of the approach which Chris more or less downplays, and which then permit many confusions (and objections) to arise whether by Hugh Darwen, the recent correspondent, or others. Unfortunately I have not been in a position to pursue clarification with Chris, so please do not take this any kind of indictment.

View updatability is of crucial importance to the relational model. If there is no solution, then there is likewise no possibility of data independence (both physical and logical!). That would be tantamount to disclaiming the relational model itself, insofar as data independence was the stated initial motivation of Dr. Codd's relational model research program.

I believe that a solution to the view updatability "problem" is not only possible, but that it is a solved problem. I do not believe that the problem has been formulated well in the literature, nor do I believe that the solution has been well and completely explained. (I'm committed to getting this done before my "function" is defunct, hence much focused work on the forthcoming book. Furthermore, I believe that the consequences of understanding and applying the solution are wide-reaching, so much so that we might have to acknowledge that we never truly understood the relational model heretofore.

A complete explanation of the solution will be offered in my forthcoming book [9]. Here I will have to be satisfied with providing a few principles, recommended conceptualizations, or guideposts. Throughout, I've placed quotes around certain words or phrases in lower case, because they could use some further explanation, might be over-interpreted, or otherwise are imprecise. I apologize in advance for not being able to do better by this important subject here. A detour at this time into a partial explanation of view updating would mean a significant delay in writing the more complete explanation in the book, a delay I am not willing to entertain at this juncture.

The "Date-McGoveran view update approach" (or, in more concrete form, the algorithm) provides a set of rules for applying updates to any relation.

These rules are not to be applied in isolation. They are, in a sense, context dependent, where the context is supplied by (1) constraints accessible to the DBMS and (2) the current user's intent as expressible through (a certain strictly constructive and strictly finite) first order predicate logic. The simple expression of the rules only tell us what to do in the absence of further contextual information.

A DBMS that supports the relational model cannot be "magic". It cannot compensate for ambiguity, intuit withheld information or assumptions, or correct expressions that violate the designer's or user's intent. If the designer fails to capture information in database design, if information is hidden from the user, or if the users incompetently fail to express their intent [Ed. Note: Or the data language is not sufficiently expressive to allow it], it will certainly produce seemingly anomalous or "surprising" results when faced with these problems [Ed. Note: Semantic correctness is not guaranteed].

No logical independence mechanism can be permitted to hide relation semantics from users (the contrary would be absurd), although it seems that many writers have assumed that to be part of its intent. The derived semantics of a view is inherently distinct from that of a base relation. For example, the relation predicate for a join view necessarily involves multiple irreducible predicates, whereas that for a base relation should not [Ed. Note: Which is another way of saying the design of base relations must adhere to the Principle of Ortogonal Design).

Logical independence can hide base data and base data representation. For example, a projection view can hide the specific data found in the excluded attributes even though the predicate clearly tells us that those attributes, and data in them, must exist. Additionally, the view might not then contain any evidence of dependencies among those attributes and attributes in other relations.

For logical independence to be possible, the semantics of relational operators must be consistent, whether the operands are derived or base relations. Although we have traditionally thought of this as requiring that we find some algorithm for view updating, this is a poor method of attack, which seems obvious when we consider that derived relations necessarily have the more complex semantics. Thus, we must solve the general case of update semantics for which updating base relations is the special case (and not the reverse).

All relational updates should be understood as set transformations: Therefore, they are atomic, and the final "state" is not dependent on the order of any intermediate steps. Just as is the case when evaluating a logical expression (e.g., a tautology, or a relation predicate), there are no implementation dependencies (as, for example, in treating the left vs. right "MINUS" first)--only implementation errors. An optimizer incorporating such implementation dependencies is faulty.

We must avoid the temptation to think of a relational update as anything other than a declarative specification of a resulting relation (a set) incorporating the relation predicate of source relation and its current value. Nothing else is pertinent to computing the value that is to be "assigned" to the relation. If a reference to one or more tuples occurs within the update expression, we should not think even of these as anything other than a particular way of expressing a constraint (i.e., as part of an "update predicate").

Correct database design is crucial for understanding view updatability (Chris Date will possibly demur on this point) and goes well beyond full normalization or even our POOD. [Ed. Note: There are three design principles, see [2]]. True, the results will always be predictable and deterministic, but are likely to produce confusing side effects from time-to-time.

David adds:

I disagree with Chris' apparent conclusion that view updating need not be based solely on logical inference; I insist that it can and should be. Many of the problems that have been raised regarding our proposal have to do with how predicates are to be interpreted, including the predicates that the user conveys to the DBMS in terms of insert, update, and delete. Put another way, our "updating language" is not sufficiently expressive to convey the user's intent accurately. As discussed in[8] (although the reader is cautioned that I did not write or edit the proposal that is attributed to me and so it is not completely accurate), this deficiency can and should be remedied.

With these principles in mind, reconsider the "problem" with the example of delete through a join view. The "betrayal" of the user is an illusion and ceases to be a problem. In fact, it is a problem which we accept all the time for base relations: We can easily construct a database of base relations for which there are "side effects" (remember cascade delete?), or which prevent the user from obtaining update results that behave as if the target relation exists in isolation.

The "problem" of finding a unique inverse transformation for a join view in a given context without specifying any context goes away. Clearly, the delete must satisfy 'NOT PA OR NOT PB'. (Aside: We must be very careful what we mean by NOT, avoiding confusion with simple complement. Relations have a relative complement--tuples not asserted to be 'true'--with a scope that is distinct from that of the relation predicate--tuples that cannot belong to the relation [Ed. Note: Representing false instantiations of the predicate]). This requirement may be further constrained in at least one of two ways.

If they know, users should specify their intent as to why the conjunction is 'false'.
Some of the integrity constraints and dependencies the designer imposes may further constrain the general requirement.

Of necessity, I leave the detailed analysis of other view updatability "problems" to the reader as an exercise in applying the principles I've stated above. My hope is that they will encourage the reader to reframe both the problem and the solution.

References

[1] Pascal, F., Denormalization for Performance: Don't Blame the Relational Model

[2] Pascal, F., Relation Predicates and Identical Relations

[3] Pascal, F., The Principle of Orthogonal Database Design Part I

[4] Pascal, F., Principle of Orthogonal Database Design Part II

[5] Pascal, F., The Principle of Orthogonal Database Design Part III

[6] Comments on "Database Design Relation Predicates and “Identical Relations”"

[7] Comments on "Denormalization for Performance: Don't Blame the Relational Model"

[8] Date, C. J., VIEW UPDATING AND RELATIONAL THEORY (O'Reilly Media, 2013)

[9] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming.

34 comments:

toledobytheseaDecember 20, 2016 at 9:18 PM
(Based on the join examples in his 2012 updating book, I think CJ Date's comments would be different today, even if that book persists with the 'reasonable' motive, which is unnecessary as D. Mcgoveran points out when he says that only logical inference is necessary.)

The OP's comments reflect a procedural interpretation of relational algebra that most or all dbms implementers are prey to, for example saying that join view deletions are deletes 'through' views but base table deletes are deletes 'from' base tables, which is a foible that has file system heritage. They approach both examples as if the problem is to decide what base table updates must have produced the desired view's value, whereas the question is what base values are implied by the desired view value. This is the result of interchanging a relation having a predicate that is not the predicate of the join's projections on each of the join operands' headings, for example pretending that PA is the predicate of ( A JOIN B ) { set of A's attributes}. Instead it is only a subset of A that has the predicate of such a projection on the join and the only question is what projections of the join result from the absence of the 'deleted' row. It's not the values of the A and B tables that need to be inferred, but the values of the respective subsets of those tables that are the effective operands of the join. For any particular join those subsets have only one solution. The subsets that DON'T affect the join's value (subsets that are NOT in the respective projections) are necessarily unchanged by a join deletion.

Of course (NOT PA) OR (NOT PB) follows from the solution but it's a mistake to think of the expression as a recipe rather than one side of a (declarative) tautology. People got away with the mistake in the file system era but when views are allowed the more exact expression is (NOT PAJ) OR (NOT PBJ) where PAJ and PBJ project the join on A and B attributes respectively.

For the two examples in this exchange a Venn diagram makes it graphic that ( PA AND NOT PB ) AND ( PB AND NOT PA) must be empty and that each of these two conjuncts is disjoint compared to the predicate of the mixed set of tuples in the conjunction given by the join's predicate, PA AND PB. Join deletion doesn't require either logical conjunct to be replaced but arguments are often made to the effect that they could be replaced. These arguments use such an amorphous notion of deletion by implementers that a DELETE command sometimes means INSERT! D. McGoveran's updating chapter gives precise definitions that prevent this mistake.

Another mistake is thinking that correct implementation of join deletions implies that at least one tuple from each operand must always be deleted or that this implies that all insertions to unions have only one solution.

DBMS implementers need to understand that support of such subsets is equivalent to support of relations and vice-versa. Neither results from support of only base relations aka base tables/relvars. Furthermore, in current systems the effect is that only base relations have predicates!

Join deletions are so important not just because they have only one solution if implemented correctly, ie., implemented on a purely logical basis aka declarative basis but because their support would change drastically the data design landscape and thereby go a long way toward the ultimate benefit of data independence, preserving the value of a data investment solely by means of logical structure. Of course this is contrary to many vested interests such as those of the big commercial products and the career investments of multitudes of developers.
ReplyDelete
Replies
UnknownDecember 21, 2016 at 2:16 AM
Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.

Date and McGoveran's first joint treatment of view updating [1994, published on dbdebunk at the time] was indeed confused in its usage of "predicate". I suspect this can be attributed to the authors failing to understand each other. Both authors have since disavowed that work; and quite understandably: it is dreadfully confused.

CJD has since clarified, and uses "relvar predicate" to mean the 'external predicate' of a relvar.

DMG [and Pascal usually follows] uses 'relation predicate' to mean the 'internal predicate'/constraint on allowable content -- at least that's my best understanding.

So for [Ed] to gloss DMG's "tuples that cannot belong to the relation" as "Representing false instantiations of the predicate" is reintroducing the confusion so evident in the 1994 material.

I think that by "tuples that cannot belong to the relation", DMG means tuples (taken as a set presented for update) for which the relation constraints do not hold; not tuples whose corresponding propositions do not hold in the world.
ReplyDelete
Replies
Erwin SmoutDecember 22, 2016 at 6:28 AM
"for example saying that join view deletions are deletes 'through' views but base table deletes are deletes 'from' base tables, which is a foible that has file system heritage"

That's funny. McG used the word "through" himself in the "David adds" section.
ReplyDelete
Replies
Fabian PascalDecember 22, 2016 at 9:45 AM
What's funny is that out of the long comment that's the only detail that caught your eye.
ReplyDelete
Replies
Fabian PascalDecember 22, 2016 at 10:31 AM
The fact is that it was the only one you commented on.

Age is not a serious argument -- validity is. The problem is that you see everything through the CJD interpretation and, unfortunately, it is not properly, if at all, grounded in FOFL and set theory and one must be grounded in them to see the problems. That's what EFC's RDM was about and it is the ignorance of the dual theoretical foundation that is responsible for all the misconceptions and misunderstanding of the RDM.
ReplyDelete
Replies
toledobytheseaDecember 22, 2016 at 4:36 PM
Fans of base relvars as well as any other language device that can be equated to physical structures or SQL implementers or file system coders usually are not able to understand data independence at all. Experience suggests that years of unlearning may be needed, the more coding experience the more years, which in turn might mean that only generational change will make any difference. McGoveran's 2004 statement that the RM is not well understood appears now to be more true than ever thanks to the accumulated obfuscations.

Codd would have had no difficulty accepting the specific deletion CJ Date describes above (see his book). While my own argument focuses only on join deletions rather than overall database consistency and is also less theoretical and more empirical than David McGoveran's, I've yet to see that it counters anything he's written and most of it was prompted by his explanations.

When I first used SQL-type products (in very basic ways even though they included every popular mainframe and Unix version) I had to wonder why a relational dbms didn't allow all user commands to address relations, instead these dbms'es addressed certain tables which were distinguished from other tables called views and none of the views could be explictly replaced by a user. Surely the material of the user interface must be relations, right? From what I'd seen of Codd's first and second papers, I couldn't understand why a dbms with the word 'relational' in its moniker didn't allow me to replace relations. After all, even a read-only database needs to replace empty declared relations before there is anything to query. As I saw more and more customer SQL databases I started to realize that all these products expected the user to program the database, the dbms didn't do that. So to replace some relations, for each one you had to find out what relations they were derived from and achieve the replacement you wanted via side-effects. There is now a whole sub-industry cluttered with experts to help users do that. Not only are applications more complicated than they need to be but there is no certain way to guarantee that every piece of application updating code is not making the intended data design inconsistent.

At present the so-called db schema and user-supplied data do not amount to the program they could amount to. Instead multiple app developers fill in the db program's missing code, repetitively and instead of executing the incomplete program much of what the dbms does is no more than regurgitating its input, the partial program as it were. .
ReplyDelete
Replies
toledobytheseaDecember 22, 2016 at 5:17 PM
... (continuation of earlier comment)

The problem with base relvars as a user device for join deletions is that the logical set of tuples addressed by the name A is not the set addressed by the name "( A JOIN B ) { A attribute set })" even though they have the same heading. These two sets are disjoint and Codd's difference operator can calculate the value of only one of them (he called this relational closure). So if a relvar is used as a target device for replacing a relation either it must have a set membership function and relational expression and predicate that correspond to exactly one of the subsets or the dbms must be capable of addressing a subset of the relvar's tuples for which those three kinds of logical addresses agree. It might be that a certain implementation language allows users to express set differences such as "( A JOIN B) minus D" that don't preserve (A JOIN B) tuples that are not in D. As far as I know this was the only kind of join deletion Codd objected to and McGoveran's update definition appears to prevent such a language.

The arguments against join deletion are strange because nearly all assume a replacement relation target such as A or B instead of one that has the set membership function and relational value expression and predicate of the join. It's why I say that in current relvar systems only base relvars actually have predicates - the dbms can replace only the relations that have predicate PA or PB, not any of the other five possible relations (at least one of which could be updated)!

Another argument that claims join deletions cause Codd's 'deletion anomaly' and therefore join deletions should be outlawed. This is off-base because Codd's solution to that anomaly had nothing to do with set operations or updates but rather with data design. When join deletion is understood it will be seen that a much bigger door opens that demands much more sophisticated data designs (and probably more automation to support them).
ReplyDelete
Replies
Fabian PascalDecember 22, 2016 at 6:25 PM
Anthony and Toledo,

I seem to have lost the last comments you submitted. If you have the text, pls re-submit. If not, pls contact me by email.

Thanks.
ReplyDelete
Replies
toledobytheseaDecember 22, 2016 at 7:19 PM
(I had to chop this into three parts to make it fit.)

Fans of base relvars as well as any other language device that can be equated to physical structures or SQL implementers or file system coders are frequetly unable to understand data independence at all. Experience suggests that years of unlearning may be needed, the more coding experience the more years, which in turn might mean that only generational change will make any difference. McGoveran's 2004 statement that the RM is not well understood appears now to be more true than ever thanks to the accumulated obfuscations.

Codd would have had no difficulty accepting the specific deletion CJ Date describes above (see his book). While my own argument focuses only on join deletions rather than overall database consistency and is also less theoretical and more empirical than David McGoveran's, I've yet to see that it counters anything he's written and most of it was prompted by his explanations.

When I first used SQL-type products (in very basic ways even though they included every popular mainframe and Unix version) I had to wonder why a relational dbms didn't allow all user commands to address relations, instead these dbms'es addressed certain tables which were distinguished from other tables called views and none of the views could be explictly replaced by a user. Surely the material of the user interface must be relations, right? From what I'd seen of Codd's first and second papers, I couldn't understand why a dbms with the word 'relational' in its moniker didn't allow me to replace relations. After all, even a read-only database needs to replace empty declared relations before there is anything to query. As I saw more and more customer SQL databases I started to realize that all these products expected the user not only to program the database in addition to the application but to specify parts of it on the fly, the dbms didn't do that. So to replace some relations, for each one you had to find out what relations they were derived from and try to achieve the replacement you wanted via side-effects. There is now a whole sub-industry cluttered with experts to help users do that. Not only are applications more complicated than they need to be but there is no certain way to guarantee that every piece of application updating code is not making the intended data design inconsistent.

It's also important for fulfilling Codd's idea to see a schema and data as a logical program which the dbms then executes. At present the so-called db schema and user-supplied data do not amount to the logical program they could amount to. Instead multiple app developers in effect fill in the db program's missing code, repetitively and instead of executing the incomplete program much of what the dbms does is no more than regurgitating its input, the partial program as it were.

(end of part 1)
ReplyDelete
Replies
toledobytheseaDecember 22, 2016 at 7:37 PM
(part 2 of 3)

Regarding relvars, following McGoveran's Venn diagram suggestion, suppose a database has two declared non-empty relations and abstract it as two partially overlapping circles. The diagram has seven non-empty subsets each having homogeneous tuples, representing seven disjoint relations with logically distinct predicates, in other words five more predicates than the two of the declared relations. (See the ironically titled web page athttps://blog.jooq.org/2016/07/05/say-no-to-venn-diagrams-when-explaining-joins/
for a graphic.)

Now assume that the seven subsets are disjoint in the sense that each is used in a different deletion situation corresponding to one of the seven different combinations of red and white circle fragments/shapes. Assume any deletion operation is independent of any other so that the deletion context always includes only one of the combinations. In this argument I'm only talking the join of the middle layout and the two shapes that comprise the A circle which happen to be disjoint as well in the usual set sense.

For example, you could let the values of the declared relations be named A and B and name their respective predicates PA and PB. Name the value of their set intersection (A JOIN B) and name its predicate P(A JOIN B). Now name the A members that are not sub-tuples of (A JOIN B), which in relational algebra will require projection, as A MINUS ( ( A JOIN B ) { A attribute set }). This name can be thought of as the logical address of the A-tuples that are not present in the joined relation and equally so could its predicate P(A MINUS ( ( A JOIN B ) { A attribute set })) and it could also signify the set membership function of this homogeneous tupleset. A logical address has no particular unique physical counterpart, for example the logical address named (A JOIN B) might have several because the joined tuples are pairs of sub-tuples or it might have no physical 'row' counterpart in terms of a single tuple. Viewed as a logical construction with no particular dbms implementation in mind each of the seven subsets exists as a set in its own right. Each could have a physical counterpart or each might not, that's immaterial to the logical structure.

A given sub-tuple might appear many times in the ( A JOIN B) subset. If it appears, it's guaranteed to not appear in the subset addressed by A MINUS ( ( A JOIN B ) { A attribute set }). (Also, a sub-tuple that fails to appear in a tuple-pair replacement of A join B might continue to appear as an A-tuple.) When I first saw online objections to join deletions, one of them was to the effect that no single solution was possible because relational logic allowed three possibilities, one of which removed a pair of sub-tuples from ( A JOIN B ) and inserted one of those sub-tuples into A MINUS ( ( A JOIN B ) { A attribute set }. I asked how could deletion mean insertion too, how does set difference include set union but I have never received an answer.
ReplyDelete
Replies
toledobytheseaDecember 22, 2016 at 7:46 PM
( part 3 of 3)

The problem with base relvars as a user device for join deletions is that the logical set of tuples addressed by the name A is not the set addressed by the name "( A JOIN B ) { A attribute set })" even though they have the same heading. These two sets are disjoint in the customary sense as well as my assumed sense and Codd's difference operator can calculate the value of only one of them (he called this relational closure). So if a relvar is used as a target device for replacing a relation either it must have a set membership function and relational expression and predicate that correspond to exactly one of the subsets or the dbms must be capable of addressing a subset of the relvar's tuples for which those three kinds of logical addresses agree. It might be that a certain implementation language allows users to express set differences such as "( A JOIN B) minus D" that don't preserve (A JOIN B) tuples that are not in D. As far as I know this was the only kind of join deletion Codd objected to and McGoveran's update definition appears to prevent such a language.

The arguments against join deletion are strange because nearly all assume a replacement relation target such as A or B instead of one that has the set membership function and relational value expression and predicate of the join. It's why I say that in current relvar systems only base relvars actually have predicates - the dbms can replace only the relations that have predicate PA or PB, not any of the other five possible relations (at least one of which could be updated)!

Another argument claims join deletions cause Codd's 'deletion anomaly' and therefore join deletions should be outlawed. This is off-base because Codd's solution to that anomaly had nothing to do with set operations or updates but rather with data design. When join deletion is understood it will be seen that a much bigger door opens that demands much more sophisticated data designs (and probably more automation to support them).
ReplyDelete
Replies
UnknownDecember 25, 2016 at 4:18 AM
Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.

Date and McGoveran's first joint treatment of view updating [1994, published on dbdebunk at the time] was indeed confused in its usage of "predicate". I suspect this can be attributed to the authors failing to understand each other. Both authors have since disavowed that work; and quite understandably: it is dreadfully confused.

CJD has since clarified, and uses "relvar predicate" to mean the 'external predicate' of a relvar. DMG [and Pascal usually follows] uses 'relation predicate' to mean the 'internal predicate'/constraint on allowable content -- at least that's my best understanding.

So for [Ed] to gloss DMG's "tuples that cannot belong to the relation" as "Representing false instantiations of the predicate" is reintroducing the confusion so evident in the 1994 material. I think that by "tuples that cannot belong to the relation", DMG means tuples (taken as a set presented for update) for which the relation constraints do not hold (or will not hold when the update is applied); not tuples whose corresponding propositions do not hold in the world.
ReplyDelete
Replies
Fabian PascalDecember 25, 2016 at 10:15 AM
David McGoveran replies (Part 2):

Chris’ use of internal and external predicate have long been a subject of my dissatisfaction with his terminology. They convey--to my mind--a horribly misleading and disastrous way of thinking: namely, that there is some “meaning” in the minds of users to be given formal recognition and that there is some formal “meaning” that the DBMS “understands” which is an approximation of the former. Nonsense. It is the responsibility of the data modeler to capture the intended subject domain in a conceptual model and then to formalize that in a logical model. What is in some users head regarding what is or should be meant by some relation is completely irrelevant. The data modeler is to document each relation he has created and the RP is the formal meaning of each relation in the logical model. It is up to users of that logical model as exemplified in the database to interact with the database consistently with that meaning. That implies they should have access to the RP for each relation they access (whether for retrieval, deletions, updates, or insertions) in a form they can comprehend and be trained to interact consistently. This is what we mean by “authorized users”: users that have the knowledge and skill not to use the tool incorrectly or blindly. While we can use constraints to make certain that the database is always in a consistent state, we can’t do anything to make sure an authorized user puts the database in a state that is inconsistent with some aspect of the physical world that it allegedly represents and it is a fallacy to think we should even try. That fallacy is one Chris has unfortunately entertained by introducing internal vs. external predicates and confusing “meaning” as formal semantics with “meaning” as subjective perception and intent.

Regarding tuples belonging to a relation, this is a complex, subtle subject for which I don’t have time, but a few quick comments that may help. In the relational model, ANY n-ary tuple can be understood as an instantiation of a relation predicate corresponding to an n-ary relation. In the untyped FOPL, that instantiation results in a sentence which can be evaluated as having either a true or a false truth value. When we talk about tuple we have to be clear as to whether we are talking about
tuples belonging to the current extension of some specific relation (what CJD would call the value);
tuples that cause the specific relation’s constraints to evaluate to a specific truth value (true or false); tuples that would cause that constraints to evaluate to true, but are not part of the current extension.
As I have said many times, the RP is necessarily the conjunction of all the relation’s constraints (those that can be expressed without reference to other relations), but this is not sufficient--it is a little incomplete. To differentiate the current extension of the relation from other sets of tuples that could be an extension of the relation, we have to know what instantiations have been asserted. I capture this idea in a special, primitive predicate I call the “assertion predicate”. We therefore have to be very careful when we talk about a tuple being true or false and which tuples do or do not “belong to the relation”. (That said as background, I agree with what Fabian says insofar as it goes.)

ReplyDelete
Replies

Add comment

POSTS

Tuesday, December 20, 2016