Sunday, November 20, 2016

The Principle of Orthogonal Database Design Part III




Note: This is a 11/24/17 re-write of Part III of a three-part series that replaced several previous posts (the pages of which redirect here), to to bring it in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
 
(Continued from Part II)

POOD and SQL


As we have seen, if relations are uniquely constrained, with a true RDBMS supporting logical independence (LI) and constraint inheritance, database design can adhere to the POOD and enable DBMS-enforced consistency. A RDBMS can also support ESS explicitly.

Industry misconceptions notwithstanding, SQL DBMSs are, of course, not relational. They have weak declarative integrity support, which, coupled with bad database designs, makes adherence to the POOD (as well as the other design principles) difficult. While even its weak relational fidelity was sufficient to render SQL superior to what preceded it, this is but one example of the many advantages of the RDM that SQL has failed to concretize.


Sadly, due to poor foundation knowledge in the industry, SQL DBMSs are considered RDBMSs. The many limitations, deficiencies and complexities due to failure to implement the RDM are absurdly blamed on the RDM, leading the industry to regress to technologies inferior to even SQL that were effectively dropped decades ago, instead of developing true RDBMSs.


A Note on Identical Relations


Identical (i.e., identically structured and constrained) relations are a trivial POOD violation. My post on the subject triggered an online discussion  that, except for an exchange between my colleague Erwin Smout and another reader who gave him a run for the money (which is recommended reading),
demonstrates, yet again, the intellectual poverty in the industry. Here I would like to respond to Erwin's comments.

"The real meaning of a relation is the intended interpretation it was given when it was designed (and which is also the criterion by which the database user decides whether a tuple should be present or not, thus by which the database user decides which updates to carry through)."
Well, yes and no.
  • Yes, the meaning of a relation is indeed that assigned to it by its designer based on a conceptual model -- a set of informal business rules of several types;
  • No, the criterion by which the DBMS -- not users/applications -- decides tuple insertions into the database, based on uniqueness of RPs;
"The DBMS doesn't read your mind before interpreting relations, which is why the real interpretation happens outside the DBMS. It interprets relations according to its own data integrity enforcement rules. The only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state), completely devoid of any form of "interpretation" in the sense of "human interpretation". And which is also why the DBMS needs "help" from its users when deciding whether TUPLE{} represents "The shop is closed" and should go in THE_SHOP_IS_CLOSED rather than "The alarm is set" and should go in THE_ALARM_IS_SET. Hell, it's why those things were called "computers". They compute. Nothing more."
Precisely. And the only way in which this can work is for all relations to be to be uniquely constrained. This is why, together with the fact that not all relations are named, names are only shorthands for the corresponding RPs, that must be substituted for the names whenever the DBMS encounters them while checking tuples for insertion in a relation.

I don't worry about Erwin. But,

"What happens when users are not sophisticated enough to "read the mind" of the designer and have either incompatible interpretations or -- just as common -- interpretations that vary by day, mood, focus, etc.? What if two applications have incompatible interpretations? There has to be an objective way to determine -- to the best of our ability -- that the interpretation (semantics that relates to the real world) upon which the database schema depends -- which is what determines logical design -- is the same interpretation used by all users who are authorized (perhaps via applications) to update the database and then enforce that semantic consistency." --David McGoveran
In this context there is irony in Erwin's remark "Maybe that's just semantics and I'm not very good at those. My DBMS is sufficiently expressive, thank you."


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.




 

5 comments:

  1. Why do you agree with the quote that TUPLE{} "should go in THE_ALARM_IS_SET" is a user decision?

    According to the Information Principle aka IP, a projection without attributes can only be derived which means the DBMS can decide if it 'goes in' a relation.

    The point of following the IP is that ignoring it can be used to disprove any logical conclusion made by the dbms which means you might as well not bother with the dbms.

    ReplyDelete
  2. I did not agree with every detail in the comment, but with the general idea that the DBMS does not understand semantically like users, but rather follows statements algorithmically, which is why letting users do that defeats the purpose.

    The IP is there for multiple reasons, not just that.

    ReplyDelete
  3. Regardless of whether Codd watered-down the IP (I believe he didn't even mention it until some years after his first papers), this has been an excellent series. I'd hoped for a fourth part about employees having neither salaries nor commissions. That might help show how traditional normalization practice introduces ambiguous RP's when the motive is only to eliminate update anomalies.

    It also needs to be said that perhaps the most important multi-table updates, join updates, are possible even without POOD. (In his RM V2 book. Codd recognized that join deletes are generally unambigous. At the same time he correctly said to the effect that union insertions are generally ambiguous.)

    Why this is not generally recognized I can only guess is due to a Hans Christian Andersen kind of world where people see things that aren't there, eg., thinking that tuples addressed by a join expression aka predicate include operand tuples that are not sub-tuples of the join, in other words relations with predicates that aren't the predicates of the operand subsets that don't determine the join relation's value.

    (I've finally managed to locate the dbdebunk discussion dating from Sept 2004 that has the comments by David McGoveran which eventually cleared up my own confusion about join deletes http://web.archive.org/web/20041013150837/http://www.dbdebunk.com/page/page/1396086
    )

    ReplyDelete
    Replies
    1. Neither salaries nor commissions loses the definition of employee.

      The advantages of the three design principles are not limited to view updating.

      The link does not get me to the discussion

      Delete
  4. A better link:

    http://web.archive.org/web/20041013150837/http://www.dbdebunk.com/page/page/1396086.htm

    ReplyDelete

View My Stats