Sunday, January 12, 2014

Data Fundamentals and Education


Conveying the theoretical foundation of database management to practitioners without losing either precision/rigor, or the audience is a non-trivial endeavor, the skill for, or interest in which very few possess. More often than not one or the other are lost.

In Relations and Relationships I referred to the following LinkedIn exchange:
AT: ... William Kent and his book "Data and Reality" [1978] ... is devoted to "attributes", and (in my words) William confesses that he can not distinct [sic] between "relationships" and "attributes". Thus, the later might be completely redundant. 
GE: This confusion of entities vs. attributes has been with us a long time. Several years ago [a student] ... discovered a paper ... that discussed this dilemma. He proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute." Beautiful, eh? That was in 1939.


The same thing can be modeled either as an entity or an attribute, depending on the informational context and purpose e.g., color can be an attribute for a clothing merchant and an entity with attributes of its own for a paint manufacturer. This is a conceptual (business) model decision that must precede logical database design, which is impossible without it.

But note that Kent's problem is not with entities and attributes, but with relationships and attributes. Many view FK's as relationships, often because they think, erroneously, that relational comes "relationships between tables" rather than mathematical relations. This is what a theoretically minded reader tried to dispel in a comment:
P: Relations/tables denote relationships. A tuple/row is in a relation/table R if and only if some user-understood statement R(X,..) holds.
A FK constraint is the representation of a particular kind of constant relationship between relationships. For variables S meaning S(X,...) and T meaning T(Y,...) the declaration of a FK from S {X} to T {Y} means that FORALL X [EXISTS X1,... S(X,X1,...) IMPLIES EXISTS Y1,... T(X,Y1,...) . Eg EMPLOYEE(p,...) IMPLIES PERSON(p,...).

A FK constraint allows the dbms to reject updates that violate it as erroneous. But to read or update the database the users know what S(X,..) and T(X,...) mean in the world so for any situation that arises they know that FORALL X [S(X,...) IMPLIES T(X,...)]. Eg if they observe EMPLOYEE(p,..) then they will observe PERSON(p,...). The database does not need to represent this redundant and constant relationship; the dbms does, conceptually in the metadata, via the constraint declaration.
FKs are NOT the relations/relationships of a database, its relations/tables are. Sadly this is widely misunderstood.
He is, of course, right, but my guess is many practitioners will fail to appreciate his explanation. Let me see if I can make it more digestible.
  • The important relationship behind relational databases is the R-table--a relationship between a collection of N pools of values--domains: columns are subsets of the domains and rows are collections of N column values, one from each domain. Domains and columns represent in the database properties and entity attributes, respectively. Rows represent propositions that are true (facts) about those entities in the real world (see my paper, Business Modeling for Database Design).
Note: To those formally inclined, the relationship is, specifically, a subset of a mathematical relation, itself a relationship--the Cartesian product of the domains--the set of all possible unique combinations of the values of the N domains.
  • A foreign key (FK) is a column of one R-table whose values reference (point to) values of the primary key (PK) column in another R-table. As columns, both PK's and FK's represent neither entities, nor relationships, but attributes.
  • The referential integrity (RI) constraint represents in the database a business rule mandating that the FK and PK values are consistent. Both R-tables are subject to it and, since they are relationships, the constraint is a "relationship between relationships".
Consider now, the second example, from another LinkedIn exchange on normalization, in which I participated.
LH: Art you have obviously been lucky to work with fast disks and customers not demanding super fast responses. Today its not so much an issue but it has been. At the end of the day all I have ever done is get the job done. I don't care if its not right according 'to the theory' and neither does the customer, all they want is a performing system and now so does the consumer.
AK: And neither do I care whether "theory" is violated. What I care about is my clients' applications persisting and retrieving consistent and safe data ... over hundreds of systems at dozens of organizations over three decades, I haven't ever needed to denormalize below 3NF, which I consider the minimum to maintain data consistency and integrity, in order to get acceptable performance.
PP: ... Your attitude may deliver short term gains but at long term major costs. The best example is Germany, which is a industrial power house as it takes a long term approach even though it may have a greater short term cost.
Here is an edited version of my response.

"I don't care about theory either" misleads the uninformed rather than enlightens. The theory is there for the very purpose of "consistent and safe data". Violate it and the loss is precisely data integrity and the guarantee of provably logically correct answers to queries. It is a bad idea to agree that it's not theory that counts--it will be misinterpreted to reinforce the common confusion of 'sound theoretical foundation' with "just a theory" and the notion that theory is somehow at odds with practice. The gap between theory and practice is greater in theory than it is in practice.

While his warning is correct, PP too fails to address the core misconception. It is not just a matter of "short- and long-term concerns"--denormalization trades data integrity (AK's very objective!) for (uncertain) performance gains and biases the database for some applications and against others (see my paper The Costly Illusion: Normalization, Integrity and Performance).
Both AK and PP replied.
AK: I think we agree at base. But my point stands. LH stated that she doesn't care about theory. I said that don't either. It is the "purpose" of consistent, quality data that I care about. A minimum normalization level of 3NF gives me the minimum level of consistency and quality of data that is required for a sane application design.
No, we don't agree. Understanding that theory exists for practical reasons is "at the base." And misconceptions ought to be dispelled, not reinforced. A major reason for lack of true RDBMS's is precisely that neither DBMS designers, nor users understand and appreciate the practical value of theory. Proprietary ad-hoc products such as those referred to as NoSQL are rooted in disregard for theory.

The comment on 3NF is also misleading. While, fortunately, in most cases 3NF designs are also fully normalized (in 5NF), if and when they are not, consistency is not guaranteed; what is more, most practitioners don't know when that is the case. Suggesting that 3NF is enough will likely also be misinterpreted.
PP: There are many ways to achieve integrity and most databases exist to support a business which survive based on profit. Implicitly my statement supports that contention that more normalisation is better for long term profitability. It makes no suggestion as to what level. To suggest a business should make decisions solely based on theory is naive and lead to the destruction of the business. Prime example was "Betamax" versus "VHS". Even though "Betamax" was theoretically much superior technically, "VHS" won the day basically due to price.
From my own experiences I have found many people don't understand the cost-benefit of normalisation, or don't understand how to tune to support normalised structures. Total normalisation, doesn't provide the optimal cost with today's technology and technically. I have not seen a RDBMS that provides maximum normalisation as understood in today's terms.
  • I do not understand how a DBMS can "provide maximum normalization". It must be provided by users: it is essentially optimal logical database design. And the problem is that it is not well understood today.
  • I know of only one way to guarantee integrity of denormalized databases: controlling redundancy, either via additional integrity constraints, or application code. Both defeat the whole purpose of denormalization and are even likely to hurt performance rather than enhance it (again, see my paper).
  • I interpret "decisions solely based on theory are naive" to mean that trading off integrity for performance is sometimes necessary and justified in practice. Even assuming that all implementation factors affecting performance are at optimum--they rarely are--it is one thing to make such compromises consciously, being fully aware of the risk, and quite another to be unaware that they are compromises and oblivious to the risk, as so many practitioners are.
I am in agreement with PP's argument in his second paragraph, but the begged questions are: Why is the state of current technology he deplores the way it is and how does he expect it to improve in the context of ignorance of and disregard for theory?



Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

30 comments:

  1. Re: " Why is the state of current technology he deplores the way it is" ?

    Part of the reason is a number of wrong turns, taken en masse by the industry, forty or more years ago. When the deeper thinkers of the field are confused it's no wonder many of the rest of us are too.

    One wrong turn is tied up with normalization and constraints (not just FK constraints). For example, very few people realize that when they delete from, say, a 3NF R-table, logically they are deleting from a join. Some very clever people also believe that join deletes are logically ambiguous, therefore can't be implemented but this is only because they choose to ignore the constraints of a join, for example, the fact that when the compound proposition ( ps OR pt ) is true of a join ( S JOIN T ) it logically implies that ( ps AND pt ) is true, and vice-versa. Those implications are effectively constraints and it's a pretty small truth table to show that there is only one way to delete from a join so it's not ambiguous at all to delete from a join when all constraints are considered.

    Another one is FK's and the SQL keyword CASCADES - many (most?) people don't realize that a delete from the T mentioned in the Linkedin exchange when CASCADES is "on", is logically a join delete. And when it's "off", it's logically not a "delete from one side", rather it's a delete of both sides followed by an insert of one side.

    Another one is that many people stop thinking once they are told normalization means avoiding redundancy and don't bother asking why redundancy is bad. Some even think the purpose of avoiding redundancy is to minimize disk space! But the logical purpose is actually to prevent (with the help of suitable table design) contradictions.

    Codd didn't do all he could to dispel all the wrong turns, maybe he was overwhelmed by how many there were in the first attempts to implement RT.

    When I see people posting such as "my clients' applications persisting and retrieving consistent and safe data ... over hundreds of systems at dozens of organizations over three decades", it reminds me of a programmer who claimed he'd written a million lines of assembler code in five years. I guess with modern machinery he could claim to have cut-and-pasted a million lines, but he certainly couldn't have originated that many. So I'd think the quote really means small parts of hundreds of systems, or the same system many times, for sure he or she couldn't have designed hundreds of disparate systems of any size in their entirety in a single career. The same poster also claims that theory doesn't matter. Well, what does he think the word "consistent" in his post means? (Try "no database contradictions, a huge selling point for db customers".) Like lots of people he throws in the word "safe" which is pretty much a porridge word, ie., meaningless in context. When people can't state their own basic experience accurately, when they apparently can't even count, is it any surprise they miss the basics of the theory too?

    But the logical basics are so few (really only three or four basic data operations in algebraic terms) that it's straightforward to use them to generalize hundreds of cases into a handful. Besides ensuring against database contradictions this is the great advantage of a logical approach. Most posters seem to ignore both advantages preferring to wallow in arcane or irrelevant details of the typical SQL products with almost complete lack of focus (eg., talking about the cost-benefit of normalization!).

    ReplyDelete
    Replies
    1. Bob:

      On my "hundreds of systems" comment, yes that was hyperbole to make a point. I have not designed hundreds of database schemas. I have, however, reviewed a sufficient number of other folks' schemas and recommended improvements to enough of those in my work performance tuning database environments that, added to my own designs, gets me sufficiently well over 100 that I might actually have surpassed 200.

      As to my comment that "theory doesn't matter", actually I didn't say that. I said that I don't "care" about theory as Fabian correctly quoted. That's a different thing that saying that the theory doesn't matter. What I said, to complete the thought, and I think that Fabian understood this but ignored it to make a valid point to readers who might misunderstand and so be mislead, is that what I "care" about is the consistency of the data. The consistency does, of course, result from the proper application of the theoretical work that Codd and others have done. The theory does and must underlay any schema design that wants to be consistent. Of necessity that means eliminating redundancy and all of the other details of normalizing the schema.

      Of practicality, however, it doesn't matter that if a user/application deletes a row from a detail table in theory what's happening is that the join that contained both the parent row and all of its detail rows was completely deleted and replaced by a new join that contained one less detail row. So, on that level, I don't care at all about the theory because I am certainly not going to implement either the application code nor any database code that I might write (written more than a few) to actually perform the detail delete in this way. I care that the database will enforce a rule that prevents the parent from being deleted while the detail rows still exists and a rule that prevents a duplicate parent or detail row. That's what I care about.

      Theory doesn't prevent a schema that includes a 30,000 character "comment" column to hold ad hoc comments from an application user. I've encountered those in schemas I've reviewed in the field that claimed to be normalized. Practical performance concerns, however, prompted me to pull that column out of the table its designer had put it in and replace it with a child table with a foreign key to the parent, an ordinal sequence, and an 80 character comment column. Add eight or ten lines of code to the module that was fetching the original data to reassemble the comment and you have a more efficient system that still supports theory.

      Delete
    2. I'm pretty sure I'm one of Bob's "people"; but I have also stated that constraints play a role in updating join operands, he's wrong that they always determine a solution, I have many times debunked examples by a writer with identical style, and his comments about OR/AND are unintelligible. So I invite him to repost that paragraph and an example in some other forum where I will debunk.

      Delete
    3. Fell free to conduct any debunking here.

      Delete
    4. The initial problem isn't updating join operands, it's updating joins. Operand values follow from that.

      No point arguing when the starting point guarantees ambiguity. Not join deletes, not anything. Ironically the same dbms'es that prohibit join deletes apparently allow the query "what are the propositions that make ( ps OR pt) true?" against the join even though the general result is often a proper subset of that of separate queries against the join operands. Both are permitted because there is no contradiction.

      (It's tiresome when the usual argument so often switches mid-stream to general union insert which isn't the same problem - the delete problem narrows the scope but insert expands it. Hence the wrong turn.)

      Delete
    5. It's a waste of time. I don't know why some join delete opponents want to spend so much time on arbitrary debunkings.

      Usually the arbitrary bit is to jump directly from some language's DELETE statement to possible base values and then state there are three ways to update two base r-tables so such deletions must be ambiguous.

      It's more productive to recognize the meaning of a join's tuples and the form of the join's logical complement.

      If you join an r-table S of supplier id's with an r-table SN of supplier id's and names, giving a join with a predicate of S AND SN, a possible tuple is { s1, smith }. The logical meaning is s1 is a supplier AND s1 is a supplier named smith.

      Most people seem to agree that the negation/deletion of a join tuple should be logically equivalent to its assertion/insertion into the complement. The logical complement of a join can be written as a disjoint union, eg., ( S AND SN ) OR ( S AND NOT SN ) OR ( NOT S AND SN ). Only the first disjunct has the same predicate as the join ( S AND SN ), so the logical effect is that the first disjunct must be 'inserted to'. From the perspective of the complement, the second two disjuncts must be empty. It's not possible to delete "only from one side" unless the meaning of the original proposition is changed arbitrarily.

      This is the step that is usually skipped when talking about join deletes. Opponents to this approach wish to change the meaning of the target tuple on the fly, so that it can mean any of ( S AND SN ), ( S AND NOT SN ), ( NOT S AND SN ). This seems silly to me when the dbms user is obviously dealing with a tuple whose proposition as above is s1 is a supplier AND s1 is a supplier named smith, not, for example, s1 is a supplier AND NOT s1 is a supplier named smith. I wouldn't want to have to tell a user that his tuple could mean any one three propositions.

      I believe this is part ( probably not all ) of what McGoveran and Date advocate for certain join delete problems, for example McGoveran has used the term 'relative complement' when writing about join deletes. But they have written in such detail (eg., Date's book and McGoveran's patent) that I often lose their rationale.

      The above is a very blunt example. It doesn't mean that a language's delete statement must always "delete from both sides". There's nothing to prevent the deletion of a partial tuple, ie., a logical projection which could easily result in a delete "only from one side".

      Opponents often bring union insertions into the picture, claiming that it's the flip side of the same problem, but obviously it's not the same problem when the union isn't disjoint.

      I'm not sure whether the the opposition is also based on a misperception that the above disjoint union complement is somehow incomplete for certain queries, eg., if a query returns false for a tuple, the tuple must be in the above disjoint union complement. But obviously there are false propositions that can't be recorded by the abbreviated form of a join's tuples, eg., the join tuple's form doesn't allow meanings a logical meaning that involves two supplier id's, such as s1 is a supplier AND s2 is a supplier named smith. So if that's part of the opposition, it seems misfounded.

      There are some opponents who fairly reasonably avoid the logical argument and simply say "who needs it?" because the user can always get the effect he wants by updating only base tables, albeit with more verbiage. Personally, I'd like more leverage.

      Date and McGoveran often mention "logical independence" as their motivation. Personally I wonder if that motivation becomes irrelevant for a dbms that doesn't permit any base r-table access and requires all unions to be disjoint. But that's just my own speculation.

      Delete
    6. @Bob Loblaw February 1, 2014 at 4:52 AM

      > It's a waste of time.

      Only if you are oblivious to being corrected when you manage to be precise.

      > so much time on arbitrary debunkings.

      You commented. Your few intelligible claims are unfounded. Like relationally uninformed comments, although you are pro-relational. And the topic is "Data Fundamentals and Education". And https://xkcd.com/386/.

      > The logical complement of a join can be written as a disjoint union, eg., ( S AND SN ) OR ( S AND NOT SN ) OR ( NOT S AND SN ).

      Not so. The tuples not in S JOIN SN, ie the tuples for which NOT(S(...) AND SN(...)), ie the tuples for which NOT S(S) OR NOT SN(S,N), are the tuples satisfying the disjoint disjunction (NOT S(S) AND NOT SN(S,N)) OR (S(S) AND NOT SN(S,N)) OR (NOT S(S) AND SN(S,N)).

      Let's say relations L and R hold those tuples for which (respectively) L(...) and R(....). Then L JOIN R holds those tuples for which L(...) AND R(...). A tuple excluded from that join satisfies (as above) (NOT L(...) AND NOT R(...)) OR (L(...) AND NOT R(...)) OR (NOT L(...) AND R(...)). Ie it makes exactly one of those three disjoint disjuncts true. BUT KNOWING THE TUPLE ISN'T IN THE JOIN DOESN'T TELL US WHICH ONE OF THE DISJOINT DISJUNCTS IT SATISFIES. Let's call the tuple's subtuples on L-columns and R-columns l and r (respectively). Then the three disjuncts are the cases of l not in L and r not in R, l in L and r not in R, and l not in L and r in R. Witahout extra information like constraints we only know that they're not both in their respective relations. There /might/ be no ambiguity for /particular/ tuples or joins, and (generic and special) constraints sometimes disambiguate.

      [That's all one needs to know. But let's look at your S/SN example. From knowing that a row is not in S JOIN SN, ie that it's a row for which NOT(S(S) AND SN(S,N)), one cannot know which disjunct is the case: its absence from the join could be because NOT S(S) AND NOT SN(S,N) or because S(S) AND NOT SN(S,N). Eg knowing that it is false that "S is a supplier AND supplier S has name N" leaves us still not knowing whether "S is a supplier". (In this example it can't be the third case NOT S(...) AND SN(S,N), because SN(S,N) implies S(N). A dbms /can/ be made to efficiently use key constraints to figure that out, ie that it must delete a tuple from SN when it deletes it from the join. But a delete from the join just/can not determine/ whether there should be a delete from S.)]

      If you could rephrase the rest of your message to be clear(er) by referring only to the things at hand and with standard terminology for relations and expressions I could debunk your related misconceptions.

      Delete
    7. [the following replacement version has some typos fixed, if your' willing. thx. philip]

      @Bob Loblaw February 1, 2014 at 4:52 AM

      > It's a waste of time.

      Only if you are oblivious to being corrected when you manage to be precise.

      > so much time on arbitrary debunkings.

      You commented. Your few intelligible claims are unfounded. Like relationally uninformed comments, although you are pro-relational. And the topic is "Data Fundamentals and Education". And https://xkcd.com/386/.

      > The logical complement of a join can be written as a disjoint union, eg., ( S AND SN ) OR ( S AND NOT SN ) OR ( NOT S AND SN ).

      Not so. The tuples not in S JOIN SN, ie the tuples for which NOT(S(...) AND SN(...)), ie the tuples for which NOT S(S) OR NOT SN(S,N), are the tuples satisfying the disjoint disjunction (NOT S(S) AND NOT SN(S,N)) OR (S(S) AND NOT SN(S,N)) OR (NOT S(S) AND SN(S,N)).

      Let's say relations L and R hold those tuples for which (respectively) L(...) and R(....). Then L JOIN R holds those tuples for which L(...) AND R(...). A tuple excluded from that join satisfies (as above) (NOT L(...) AND NOT R(...)) OR (L(...) AND NOT R(...)) OR (NOT L(...) AND R(...)). Ie it makes exactly one of those three disjoint disjuncts true. BUT KNOWING THE TUPLE ISN'T IN THE JOIN DOESN'T TELL US WHICH ONE OF THE DISJOINT DISJUNCTS IT SATISFIES. Let's call the tuple's subtuples on L-columns and R-columns l and r (respectively). Then the three disjuncts are the cases of l not in L and r not in R, l in L and r not in R, and l not in L and r in R. Without extra information like constraints we only know that they're not both in their respective relations. There /might/ be no ambiguity for /particular/ tuples or joins, and (generic and special) constraints sometimes disambiguate.

      [That's all one needs to know. But let's look at your S/SN example. From knowing that a row is not in S JOIN SN, ie that it's a row for which NOT(S(S) AND SN(S,N)), one cannot know which disjunct is the case: its absence from the join could be because NOT S(S) AND NOT SN(S,N) or because S(S) AND NOT SN(S,N). Eg knowing that it is false that "S is a supplier AND supplier S has name N" leaves us still not knowing whether "S is a supplier". (In this example it can't be the third case NOT S(N) AND SN(S,N), because SN(S,N) implies S(N). A dbms /can/ be made to efficiently use key constraints to figure that out, ie that it must delete a tuple from SN when it deletes it from the join. But a delete from the join just /cannot determine/ whether there should be a delete from S.)]

      If you could rephrase the rest of your message to be clear(er) by referring only to the things at hand (relations and expressions) and with standard terminology then I could debunk your related misconceptions.

      Delete
    8. [hi Fabian. I just tried to send the following typos-fixed replacement version of my reply to Bob, if you don't mind using it instead, but I don't know whether it went through. philip]



      @Bob Loblaw February 1, 2014 at 4:52 AM

      > It's a waste of time.

      Only if you are oblivious to being corrected when you manage to be precise.

      > so much time on arbitrary debunkings.

      You commented. Your few intelligible claims are unfounded. Like relationally uninformed comments, although you are pro-relational. And the topic is "Data Fundamentals and Education". And https://xkcd.com/386/.

      > The logical complement of a join can be written as a disjoint union, eg., ( S AND SN ) OR ( S AND NOT SN ) OR ( NOT S AND SN ).

      Not so. The tuples not in S JOIN SN, ie the tuples for which NOT(S(...) AND SN(...)), ie the tuples for which NOT S(S) OR NOT SN(S,N), are the tuples satisfying the disjoint disjunction (NOT S(S) AND NOT SN(S,N)) OR (S(S) AND NOT SN(S,N)) OR (NOT S(S) AND SN(S,N)).

      Let's say relations L and R hold those tuples for which (respectively) L(...) and R(....). Then L JOIN R holds those tuples for which L(...) AND R(...). A tuple excluded from that join satisfies (as above) (NOT L(...) AND NOT R(...)) OR (L(...) AND NOT R(...)) OR (NOT L(...) AND R(...)). Ie it makes exactly one of those three disjoint disjuncts true. BUT KNOWING THE TUPLE ISN'T IN THE JOIN DOESN'T TELL US WHICH ONE OF THE DISJOINT DISJUNCTS IT SATISFIES. Let's call the tuple's subtuples on L-columns and R-columns l and r (respectively). Then the three disjuncts are the cases of l not in L and r not in R, l in L and r not in R, and l not in L and r in R. Without extra information like constraints we only know that they're not both in their respective relations. There /might/ be no ambiguity for /particular/ tuples or joins, and (generic and special) constraints sometimes disambiguate.

      [That's all one needs to know. But let's look at your S/SN example. From knowing that a row is not in S JOIN SN, ie that it's a row for which NOT(S(S) AND SN(S,N)), one cannot know which disjunct is the case: its absence from the join could be because NOT S(S) AND NOT SN(S,N) or because S(S) AND NOT SN(S,N). Eg knowing that it is false that "S is a supplier AND supplier S has name N" leaves us still not knowing whether "S is a supplier". (In this example it can't be the third case NOT S(N) AND SN(S,N), because SN(S,N) implies S(N). A dbms /can/ be made to efficiently use key constraints to figure that out, ie that it must delete a tuple from SN when it deletes it from the join. But a delete from the join just /cannot determine/ whether there should be a delete from S.)]

      If you could rephrase the rest of your message to be clear(er) by referring only to the things at hand (relations and expressions) and with standard terminology then I could debunk your related misconceptions.

      Delete
    9. Regarding "Not so", that expression will seem bizarre to anybody who starts with the question of what are the base relation/r-table/relvar values that result from a join delete or wrong to anybody who entertains all of the base tuples, negated or not negated, being in the scope of an update.  It's a clue to a requirement of the result, not a starting point. It's strange to me why a join tuple standing for a conjunction A AND B could in effect be replaced (by an update manipulation) by one standing for A AND NOT B, representing a tuple that can never be in a that join. Surely the tuples involved in the manipulation should be tuples of the join, just as the propositional interpretation of the join involves only the conjunctions of propositions that have matching values.  The essence of join is to exclude the other propositions.

      I realize there are many who have a fixed position opposite to the above, probably too many for my opinion to make any difference which is why I say it's a waste of time. But even though there are much better writers than I am about this, I'll go on, one last time, in the spirit of Fabien's attitude that there are probably lurkers who haven't made up their minds yet.

      When considering a join delete, rather than starting with possible base values, the more immediate question is what should be the values of the projections that when joined produce the target value that omits the specified tuple(s) but retains the original tuples that aren't affected or whose non-existence isn't implied by the specified negation.  The obvious way to know this is to first apply (algebraically, not physically) the delete expression to the projection values (ignoring base values) and then join those resulting projections with the base values.  The resulting projections determine the base values, the base values follow from the projections.

      This is not my idea, it just seems to follow from what David McGoveran hinted at many years ago on the original dbdebunk site (sometime in 2005 I think).  It's my take at a rationale for the kinds of results CJ Date and McGoveran have described though I must say that I've not seen them put it just so.  (For example, it doesn't necessarily "delete from both sides" and ignores implementation tactics such as what Date calls "compensatory actions". I'm also not always sure whether what they've written since is about implementation rather than pure update theory.) It appears to apply equally to base deletes too, even when those aren't explicit joins.  (I think that join delete is easily the most important aspect of what they call logical independence because it seems the most frequent requirement for a database where the headings/attributes grow.)  

      (In that same old post, there was another useful hint by Fabien himself, alluding to language semantics, which I took to mean that, for example, the text in a WHERE clause needn't necessarily refer to a tuple, rather it could be viewed as specifying a constraint.  Perhaps he didn't mean that and was only pointing out that there's no reason a language can't express NOT ( A OR B ) rather than the usual NOT (A AND B) in a WHERE clause, which I have no objection to except that it might be too heavy-handed in most instances. But his comment all by itself shows why it's a wrong turn to deny join deletes.)

      (I have the impression that criticisms of the above put the cart before the horse in various ways or they are based on a desire for a relational model that implements first order logic which I find bizarre given that Codd's algebra and others such as Darwen's are specialized applications, not implementations. That logic doesn't have programming notions such as assignment and allows conjunctive propositions that can never be expressed by a relational join. But if I'm wrong in this aside, it doesn't change the main point.)

      Delete
    10. Regarding "KNOWING THE TUPLE ISN'T IN THE JOIN ...", I couldn't count the number of times I've read that mantra. Here is an attempt at a more concrete rationale than I gave above.

      1. Whatever algebra they use, I think most people expect  deletion in terms of that algebra to mean that some language's DELETE statement such as DELETE FROM J WHERE d, with the desired replacement value of J named J', can be expressed as:

      J' = J AND NOT d,

      where "AND" and "NOT" have the same effect as "" and "" in Darwen's A-Algebra and d names a relation that has the same predicate as J.

      I think they also expect that when the DELETE statement is valid, the result will be such that QUERY J' WHERE d gives an empty relation value.

      ( When the d-heading happens to be the same as the J-heading, the above could be written as J' = J MINUS d.)

      2. When J is a join, as in J = A JOIN B, aka J = A AND B, it seems that many people want to make an immediate substitution for J:

      J' = ( A AND B ) AND NOT d,

      for which there are three solutions (three possible combinations of A' and B' values), which I think is premature evaluation and it's more accurate to write:

      J' = ( J { A heading } AND J { B heading } ) AND NOT d,

      (where the curly braces indicate projection/quantification)

      which avoids referring to all of the tuples of A and B, some of which might be irrelevant because they aren't sub-tuples of J.  They would be irrelevant because they are not sub-tuples of J.  (In other words, their negation is determined by J's value just as the negation of tuples not in A or B is determined.)

      3. The A-tuples of A that aren't sub-tuples of J are the same sub-tuples that aren't in J', ie.,

      A MINUS J { A heading } = A' MINUS J' { A Heading }.

      4. Since J' = J' { A heading } AND J' { B heading },  the A' tuples that are sub-tuples of J' can be written as J' { A heading },

      So A' = J' { A heading } UNION ( A MINUS J { A heading } ).

      B' has a similar expression.

      When the d heading is a superset of the A heading (ie., there is no possibility of A-sub-tuples that match d),

      A' =  ( A AND NOT d ) { A heading}, similarly for B'.

      Possibly the above can already be expressed without  Darwen's algebra, eg. with operators that have already been implemented in dbms'es, such as ones that reflect his and Date's Tutorial D, this is something I haven't checked through.

      (Personally I'd rather not have an operation called DELETE because it's tradition is deleting rows/tuples/records, not sub-tuples, so I'd prefer something like DENY or even ANTIQUERY!  Also, I'd rather call INSERT something like ASSERT for the same reason as well as to allow the automatic denial of sub-tuples that might prevent the insertion/assertion of tuples or sub-tuples to a join.  That wouldn't prevent more restrictive implementations where d has the same heading as J. )

      Delete
    11. 1.
      My previous message assumes /only/ what you do in 1 above, that the given tuple is not in the new join. It correctly explains that which operand(s) to remove a d subtuple from is ambiguous. It is not a matter of "want", it is a mathematical consequence. (For simplicity I didn't go into ambiguities regarding other A and B tuples. See below.)

      2.
      I described an example. You could have tried some numbers:
      Old S = {< S s>} "S is a supplier". s is a supplier; there are no others.
      Old SN = {< S s, N n>} "S is a supplier named N". s is named n; there are no other named suppliers.
      Old J = S JOIN SN = {< S s, N n>} "S is a supplier AND S is a supplier named N". s is named n; there are no other named suppliers.
      d = < S s, N n>.
      New J = old J MINUS {d} = {}. So it's not the case that "s is a supplier AND s is a supplier named n". But we do not know whether (new S) "s is a supplier". (And we do not know whether there are now a bunch of other (nameless) suppliers either.)

      Now, if S being a supplier means S has a name in SN, that is another constraint you have not given that /in this particular case/ /could/ be used by a dbms to infer that there are no other suppliers. (Ie S=SN{S}.) But in general, we cannot determine new S from new J.

      3.
      Here are two errors in your reasoning.

      > (In other words, their negation is determined by J's value just as the negation of tuples not in A or B is determined.)

      No, it is not determined. Once d is not in the new J, it is consistent that one of the new A or B not have d while the other have any (constraint-satisfying) combination of tuples matching d for common attributes. Because the lack of d in the first means that those tuples will not be in the new J.

      > 3. The A-tuples of A that aren't sub-tuples of J are the same sub-tuples that aren't in J', ie.,

      If that were so, then yes d wouldn't be in either of the new A and B, since d is one of the tuples not in the new J. However, it is not so; it doesn't follow from from the other assumptions. (Basically, you are assuming d isn't in either then deriving that it isn't in either.)

      4.
      Almost every technical statement of your posts (to the extent they can be made out) is wrong. Please do not offer another argument. Find an assumption or step in mine that is wrong, or bite the bullet.

      Delete
    12. Regarding:
      ----------------------------<
      2.
      I described an example. You could have tried some numbers:
      Old S = {< S s>} "S is a supplier". s is a supplier; there are no others.
      Old SN = {< S s, N n>} "S is a supplier named N". s is named n; there are no other named suppliers.
      Old J = S JOIN SN = {< S s, N n>} "S is a supplier AND S is a supplier named N". s is named n; there are no other named suppliers.
      d = < S s, N n>.
      New J = old J MINUS {d} = {}. So it's not the case that "s is a supplier AND s is a supplier named n". But we do not know whether (new S) "s is a supplier". (And we do not know whether there are now a bunch of other (nameless) suppliers either.)
      -------------------------->

      So what?  As far as the original join is concerned, we don't know how many nameless suppliers there were in the first place either.  In any situation there will be things we don't know, we don't even know if S and SN are base.

      S JOIN SN doesn't mean S NOT MATCHING SN nor SN NOT MATCHING S, so neither the join nor its complement can express those meanings.  Their relations aren't operands of the join, why should they be operands of the join's deletion? They may be affected by a deletion result, just as other relations can be affected by any delete or insert. I want a deletion that deletes tuples that can be inserted to the same table/R-table/relvar.  What is the point of entertaining a deletion operation that deletes tuples that can never be inserted and then claiming that it involves unknowns?  

      All a join deletion can mean is that a subset of S might be replaced with a smaller subset of S, similarly SN, and I'm pretty sure that a precise definition is possible that would apply to non-joins as well (that S-subset might not be replaced, eg. if s is allowed to have two names then only the tuples mentioning n are deleted).  That's all it can mean as far as the target table/R-table/relvar is concerned. We can only delete from S MATCHING SN or SN MATCHING S.  In other words define delete in terms of what we know, not what we don't know.

      This "unknowns" objection reminds me of many applications where various similarly expansive what-ifs were used to multiply requirements giving all kinds of impossible feature-creep problems.

      I expect you might still ask, "well regardless of all that, what about the unknowns?"! But I wrote the above for the otherwise undecided.

      Delete
    13. >> rephrase the rest of your message to be clear(er) by referring only to the things at hand (relations and expressions) and with standard terminology
      >> Find an assumption or step in mine that is wrong, or bite the bullet.
      You appeal unintelligibly to untruths left unjustified.

      >> Almost every technical statement of your posts (to the extent they can be made out) is wrong.
      > All a join deletion can mean is that a subset of S might be replaced with a smaller subset of S, similarly SN,
      >> Once d is not in the new J, it is consistent that one of the new A or B not have [a tuple matching] d while the other have any (constraint-satisfying) combination of tuples matching d for common attributes.
      (Etc.)

      > I want a deletion that deletes tuples that can be inserted to the same table/R-table/relvar.
      Unintelligible. IF DELETE only says that "J' = 'J MINUS {d}" then A and B cannot always be determined. If you intend DELETE to say more than that then you have failed to express your intentions. Nevertheless either J is publicly defined as A JOIN B and DELETE says something about J, A and B or else only J is public and A and B cannot always be determined.

      Delete
    14. Regarding:

      > I want a deletion that deletes tuples that can be inserted to the same table/R-table/relvar.
      Unintelligible. IF DELETE only says that "J' = 'J MINUS {d}" then A and B cannot always be determined. If you intend DELETE to say more than that then you have failed to express your intentions. Nevertheless either J is publicly defined as A JOIN B and DELETE says something about J, A and B or else only J is public and A and B cannot always be determined.
      ----------------------------------------
      Regarding the last point, as far as the user is concerned, DELETE need only say something about J and {d} and nothing else, eg., A and B don't need to be public.

      Since new J is known by the user (at least the user knows the new J value the user expects), the expected value of ( new A MATCHING new B ) is known by the user, even if the user doesn't know about A and B or their values and the possibly unexpected value of new J is also available to the dbms in the form of ( A JOIN B ) MINUS d.  Clearly new J includes the original A-tuples of old A that continue as sub-tuples of new J:

      i)  ( old A MATCHING ( new J ) ).

      The user doesn't know the original A-tuples that aren't sub-tuples of old J and new J ( if they are in old A NOT MATCHING old B, they must be in new A NOT MATCHING new B ), but the dbms can calculate them, so the user doesn't need to know them:

      ii)  ( old A NOT MATCHING old B ).

      I can't think of any other A-tuples that could be in new A, so I think this is a complete expression ( keeping it straightforward without any optimization) of the new A value where the value of each term on the right-hand-side is known either by the user ( new J ) or the dbms ( old A, old B):

      iii)  new A = ( old A MATCHING ( new J ) )
                       UNION
                       ( old A NOT MATCHING old B ).

      If ( A MATCHING {d} ) isn't a key of the old J value, (I'm talking only about a key, or superkey if you like, for a particular relation value, not a table/r-table/relvar key, so tuples described by old A MATCHING {d} continue in new J),  then either ( B MATCHING {d} ) is such a key or it isn't.  If it is, then new A = old A and new B is calculated with B replacing A. If it isn't, then new J = old J and new J might not be the user's expected value (eg., some cross-products).
       
      If there are such keys "on both sides", that's the only time I'd expect "delete from both sides".

      Any time one side has such a key and the other doesn't then it's possible more J-tuples will be deleted than some user expected, but the result really should be expected if the user knows the J predicate.

      When I substitute your { < S s, SN n > } deletion example into this expression,  I get new S = new A = ( new A MATCHING {} ), in other words new S is empty, so there must be no nameless suppliers.  Since the keys are "on both sides", I'd expect new SN to also be empty.

      I gather you think your example could just as well result in three different arbitrary delete combinations.  One is new S = old S = { < S s > }, even though that one tuple is a key of old S.  I remember reading in Codd's book about a union insert that he thought wasn't possible and maybe he was right about ordinary union inserts (which I don't think are important) but I've never seen a reason given for this kind of join deletion choice, only that's it's "logically possible".  Well of course it's possible if the deletion is specified as DELETE d from S, but that wouldn't be a join delete unless S happens to be a join too in which case the above expression would apply to S as well with S replacing J.  I'm pretty sure the above expression applies just as well to a variation on DELETE d from S, as in "DELETE d from ( S{S} JOIN S{} ).

      No doubt this isn't the only way to understand join deletion, eg., deletion of projections might be more general or maybe using GROUP to explain it would be better.

      Delete
    15. Sorry, I mangled that last comment, blame it on partial memory and maybe from having to chop and re-arrange a bunch of text to fit in the 4k message limit.

      In particular I was wrong to say about the example that a tuple would be deleted from the S table/r-table/relvar, at least from that single attribute version of S.  I must have been thinking of a different S.  In fact, it can't be deleted from through this join nor most joins involving it ( exceptions might be cross-products and joins involving empty headings ).
       
      Usually, deletion of d from R has a definition based on the expression " R MINUS d ", which defeats the purpose of unambiguous deletion from joins.  A slightly different definition could be used instead, for example involving " R NOT MATCHING d ".  With such a definition, d's heading need not be R's heading and it is possible to delete unambiguously, non-arbitrarily, from exactly one operand of a join provided that the heading of d is a subset of the heading of exactly one of the join operands. 

      In addition to that change in definition, it's also necessary to recognize that the negation of a tuple doesn't imply the negation of proper projections of that tuple.  In the S JOIN SN example, this means that the tuple { s, n } in NOT new J doesn't imply that the tuple { s } is in NOT new S { s }.  So when d and SN include the tuple { s, n }, no change to S is implied.  The tuple { s } is in the relative complement NOT new J { S } but not in the absolute complement of new S.  In this case there is only one other operand, SN, so when { s, n } is not in new J, the tuple { s, n } must necessarily be in the absolute complement of the only other operand, NOT new SN.     

      Since only one join operand variable can be deleted from in this way, the ability to delete from several join operands in a single operation depends on a further re-definition of delete.  For example, deletion from J = S JOIN SN JOIN T, re-written as ( S JOIN SN ) JOIN ( SN JOIN T ) could use a definition based on the expression "  ( ( S JOIN SN ) NOT MATCHING d ) JOIN ( ( SN JOIN T ) NOT MATCHING d2 ), among others, to define the result of the calculation of new SN and new T (the result, rather than the calculation per se).  Here d's heading needs to be a subset only of SN's and not of S's or T's and likewise d2's heading needs to be a subset only of T's heading.  I think that variations on the re-write would all produce the same effect, given this requirement for the d and d2 headings. 

      An implementation could apply a conventional delete statement to give the effect which amounts to asserting NOT d AND NOT d2, as opposed to NOT d OR NOT d2.  If S, SN and T have no common attribute, the effect might surprise but no more so than Codd's "connection trap", like when the common attribute(s) aren't all in the keys of the original operands.  Schema design matters and personally it doesn't surprise me that deleting from a join that has operands with the same heading is ambiguous.  I'd say it's the schema itself that's ambiguous.

      Delete
    16. > bob April 4, 2014 at 8:53 AM
      > it can't be deleted from through this join nor most joins involving it

      As I said: not possible in the general case; possible depending on constraints. (Including the cases you just mentioned and including foreign keys like that disambiguating SP earlier.)

      > " R NOT MATCHING d ". With such a definition, d's heading need not be R's heading and it is possible to delete unambiguously, non-arbitrarily, from exactly one operand of a join provided that the heading of d is a subset of the heading of exactly one of the join operands.

      It is not clear what your operator is. You seem to be _defining_ a delete to only delete from join operands that have d's heading, which contradicts that the user only knows about R and not its operands. For the following I'll assume you meant that the new value of R is such that it looks like it used to less tuples matching d.

      > So when d and SN include the tuple { s, n }, no change to S is implied.

      Given that < s,n > is not in the new J there is no implication of change to S and there is no implication of non-change to S.

      > The tuple { s } is in the relative complement NOT new J { S } but not in the absolute complement of new S.

      No. Given that < s,n > is not in SN, < s > could be in or not in S; the new S JOIN SN would in either case be the new J.

      The same example from earlier contradicts your new claims. It shows that the new S is not determinable in general from just the value of S JOIN SN. Ie for a given new J, < s > can be in S and it can be not in S. It doesn't matter what else you claim; if it contradicts this you made a mistake.

      You are not even clear about your command let alone your (unfounded and unsound where not unintelligible) justification. Given a new value for J (equal to S JOIN SN) and whatever other operands what exactly does your command give for new values of S and SN solely in terms of the operands and the old values of S and SN? Please make it absolutely clear; you should be able to give an expression for each using only the operands. Please don't reply until you work through that command for when new J(s,n) is false and new S(s) is true and for when new J(s,n) is false and new S(s) is false. Which is impossible unless your command actually includes other info about S and SN than just the new value of J.

      Delete
    17. 1) Regarding "As I said: not possible in the general case; possible depending on constraints.":

      This example purposely doesn't have any conventional constraints ( unless you consider the join itself to be a constraint of a kind ). 

      2) Regarding "You seem to be _defining_ a delete to only delete from join operands that have d's heading, which contradicts that the user only knows about R and not its operands.":

      I'd say that's a spurious contradiction.  The user ( one who is not insulated from such operations by an application ) will have some interpretation for R, whatever he's been told, and possibly that's not even a join in this case.  There is no assumption that the user knows the join operands, here J could even be base.  What matters is the d-value the user supplies, no matter how the user decided or knew to use it, and the view definition recorded in the db. 

      Secondly, the deletion I'm talking about doesn't delete from multiple "operands", it deletes from one join operand, or equivalently, inserts to exactly one of the complements NOT S or NOT SN.  Depending on the d-value, deletion may not always result.  The dbms is able to determine which operand to delete from and calculate its value even when the user is not aware of operands.  When J is base, join or not, it's still the case that only one operand is deleted from.

      3) Regarding " Given that < s,n > is not in the new J there is no implication of change to S and there is no implication of non-change to S.":

      Non-change of S is implicated.  Simply put, there is no tuple < s, n > in S, so no such tuple can be deleted from S.  To put it another way, I think it's a generally expected requirement that a query's value such as S JOIN { < s, n2 > } should be unchanged ( when that's logically possible) after a deletion that doesn't mention < s, n2 >.  ( Perhaps you will consider the requirement to be another constraint of a kind, but that wouldn't change my point. )

      Distributing d over the join operands with an unrestricted algebra, then projecting, calculates the same result, eg., in A-algebra, new S = ( S d ) { heading of S }.  This can nearly always be optimized by skipping the calculation when the S-heading is a proper subset of the d-heading. 

      ( In my previous message I screwed up when I tried to express new S with NOT MATCHING, possibly I could have expressed it with a much longer expression including UNION, but I won't distract with that here.  It's been a few years since I thought about this stuff and obviously I got some details wrong earlier but I hope this latest won't mislead so much. )

      By contrast, the query value SN NOT MATCHING d  ( here d and SN have the same heading, SN's is not a proper subset of d's ) doesn't preserve the truth of  < s, n >.  Given that < s > remains in new S, < s, n > must not be in new SN.

      4) Regarding " No. Given that < s,n > is not in SN, < s > could be in or not in S; the new S JOIN SN would in either case be the new J.
      The same example from earlier contradicts your new claims. It shows that the new S is not determinable in general from just the value of S JOIN SN. Ie for a given new J, < s > can be in S and it can be not in S. It doesn't matter what else you claim; if it contradicts this you made a mistake.":

      The actual contradiction is the contradictory assumption that < s, n > could remain in new J when < s > remains in S ( per point 3 above ). 

      Logic doesn't tell us what to do, it only tells us what has been done ( or attempted ).  Obviously any desired operation  expressed ambiguously needs to be replaced with a non-ambiguous operation.

      Delete
    18. You do not seem to understand that the example values I gave for old S, SN and J and new SN and J can happen when newly S(s) (ie < s > is in new S) and when newly NOT S(s) (ie < s> is not in new S) and that you are only addressing a command appropriate for one although either can arise.

      The command you describe [sic] can only be used when the user knows about S, they know that certain constraints on new S must be communicated to the dbms in addition to the new J in order update the database correctly and they communicate it to the dbms.

      > people also believe that join deletes are logically ambiguous, therefore can't be implemented but this is only because they choose to ignore the constraints of a join

      > Obviously any desired operation expressed ambiguously needs to be replaced with a non-ambiguous operation.

      Your claim that one can update through join is contradicted by your own confirming that update through join is ambiguous and your proposing commands with sufficient additional constraints to disambiguate.

      >> Given a new value for J (equal to S JOIN SN) and whatever other operands what exactly does your command give for new values of S and SN solely in terms of the operands and the old values of S and SN? Please don't reply until you work through that command for when new J(s,n) is false and new S(s) is true and for when new J(s,n) is false and new S(s) is false.

      Your writing continues to consist almost entirely of unintelligible vagueness.

      Delete
    19. I doubt I'll ever understand what is so difficult/unintelligible about a definition for DELETE d FROM R wherein d is a subset of a projection of R.  When R is a join and the projection has a heading unique to one operand, it's rows of the projection that are deleted, thereby rows of only that operand, and rows of the other operands are irrelevant. So DELETE d from J where d = { < s, n > } specifies the deletion of a row meaning " supplier s is named n ", rather than one meaning " s is a supplier AND supplier s is named n ", though the second meaning is obviously concluded. 

      This approach works for non-joins too, except that projection uniqueness doesn't matter.  

      Beats me why so many smart people want to stick with a definition that has only the second, hopeless meaning. 

      If the above basic idea can't be grasped there's no point going into related questions, such as how to combine such deletes, let alone more editorial.

      Delete
    20. 1.
      > I doubt I'll ever understand what is so difficult/unintelligible about a definition for DELETE d FROM R wherein d is a subset of a projection of R.

      The "difficulty" is not per se that your poor wording obscures your (now clear?) command description but that you have wrong and fuzzy beliefs and reasoning of which the wrongness and fuzziness is hidden by by your poor wording.

      Please work through the counterexample:

      The user see that J holds the tuples for which J(S,N) "S is a supplier AND S is a supplier named N" ie "S is a supplier named N". The dbms sees that J is the view S JOIN SN. The dba sees what the user sees, what the dbms sees and that S holds the tuples for which S(S) "S is a supplier" and SN holds the tuples for which SN(S,N) "S is a supplier named N".

      Situation b: s1 is a supplier; there are no other suppliers; s1 is named n1; s1 has no other names: J is {< s1,n1 >}, S is {< s1 >}, SN is {< s1,n1 >}.
      Situation a1: s1 is a supplier; there are no other suppliers; s1 has no name: J is {}, S is {< s1 >}, SN is {}
      Situation a2: there are no suppliers: J is {}, S is {}, SN is {}.

      Give a command expression for each case below by which the user can properly set J, S and SN.
      1. moving from situation b to situation a1
      2. moving from situation b to situation a2

      2.
      Summary: Without sufficient constraints the value of a join alone does not determine the values of its operands. First, a deleted tuple could be missing from one, the other or (mutually exclusively) both, the user must additionally say which case it is. Hence they must know it's a join, what join, and what the operands' predicates are. Second, if a tuple is absent from one operand then any number matching it could be in the other so the user must additionally say which ones. Combined these mean that the user could have just given the operands separately. Ie the user disambiguates what the dbms can't.

      There certainly are cases where that disambiguation is by less than all the operands' values. But that's update through join PLUS SUFFICIENT CONSTRAINTS. (Including constraints embodied by update commands). That's the KIND of thing you wish the user could do with d and without knowing J = S JOIN N, yes. But it turns out that particular claim is wrong.

      Delete
    21. Regarding < quote > The user see that J holds the tuples for which J(S,N) "S is a supplier AND S is a supplier named N" ie "S is a supplier named N".

      So the user knows that "S is a supplier named N". Let him retract that if he wishes, don't extrapolate the situation when you don't need to, at least if you want progress. The abstract rdbms needs only to act on what is known.

      Delete
  2. Bob,

    You cannot expect people to know and appreciate logic and constraints if they are never exposed to it during their schooling (which s not the same as education).

    There is one thing I want to clarify: Most R-tables in 3NF are also in 5NF and, if so, they do not represent joins. Only 3NF R-tables that are NOT in 5NF represent bundles classes of entities. Indeed, what full normalization does is ensures that there is one entity class per R-table.

    ReplyDelete
  3. Nevertheless, the decomposed 5NF base tables follow from the logical structure, not the other way around. Just because people like rules-of-thumb, such as 5NF, doesn't change that.

    Early on, people realized that they could enforce certain constraints with a decomposed table design and a possibly simpler dbms with more tables but simpler ones for users to remember. But the join of the different tables AND their constraints remains a join. Partly because of the clever early work on normalization we now have the tradition where updating, which is all about obeying constraints, is understood almost always in terms of base tables (R-tables if you prefer) and the logical structure follows from the base table design. But the possible values of the base tables are determined by constraints, implemented in some way or other but rarely (never?) as a view. Logically when a table is decomposed to several 5NF base tables and one of those is updated, it's a join that's updated. As Date and McGoveran say, Ignoring that risks losing the advantage of logical independence, which may be why SQL has so many adhoc gewgaws such as triggers.

    ReplyDelete
  4. Ah, yes, sure, updates on tables that are subject to a referential constraint must obey it, which involves a join. It is the very join that must be added if one "composes" the tables, to control the ensuing redundancy.

    This is something that "denormalizers for performance" miss: whatever gains they get, it's due to not adding the constraint, or, in other words trading integrity for performance.

    I just explained this in a live exchange:

    http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=5824669175070613504&gid=2357895&commentID=5825358016429449216&trk=view_disc&fromEmail=&ut=28u3_oMxnXg641

    ReplyDelete
  5. I gave up after a few minutes. What a lot of smoke the originator and a number of the respondents created (not you) about something very simple. Logically speaking, all he wants is a disjoint union. Disjoint unions always being expressible in part by joins that means all he wants is for MALES JOIN FEMALES to be empty. Instead of starting with that and figuring out a particular product can deal with it, he starts with the physical problem. Makes me think the "modelling" was probably never logical, after all he can't even remember how long he thinks he's been doing whatever kind of modelling he thinks he's been doing!

    I get very tired of people blaming their ignorance on their schooling. There's never been any lack of resources for people who are interested in the subject. If they spent more time understanding the resources they might be able to challenge the many ignorant teachers of the subject, many of whom encourage only dogma and rote in a relatively young field that is still formative and finding its way.

    ReplyDelete
    Replies
    1. It's important to reach those lurkers that can be misled by those ignorants--I don't do it to convince them.

      There is no point in commenting here. You should take your comment and post it there. You don't have to respond to replies, just speak your mind.

      Delete
  6. As I posted before (April 16, 2013 at 10:08 AM)
    FORALL X,X1,...,Y1,... [S(X,X1,...) IMPLIES T(X,Y1,...)]
    should be
    FORALL X [EXISTS X1,... S(X,X1,...) IMPLIES EXISTS Y1,... T(X,Y1,...)

    ReplyDelete