Sunday, November 20, 2016

The Principle of Orthogonal Database Design Part III

(Continued from Part II)


As we have seen, a true RDBMS that supports constraint inheritance can--if database design adheres to the POOD--support directly and transparently entity supertype-subtype (ESS) relationships by
  • Generating the appropriate views;
  • Generating and enforcing disjunctive constraints on independent base relations;
  • Properly propagating view updates to the underlying base relations.

SQL and ESS Support

Common misconceptions notwithstanding, for many reasons SQL DBMSs are not truly relational.
There is little integrity support beyond shorthands for key and referential (foreign key) constraints, let alone constraint inheritance. In fact, even though disjunctive constraints are syntactically similar to referential constraints, users cannot define them (at least not declaratively) because SQL DBMSs have not implemented the ANSI/ISO standard SQL ASSERTION [1,2]. Consequently, and because designs are not guaranteed to adhere to the POOD, multi-relation views  are not updatable. So while the POOD is desirable, even if database design adheres to it, ESS support by SQL DBMSs is not possible.


Note: Whether disjunctive constraints can be enforced by stored procedures is implementer-specific. Although better than application code, SPs tend to be complex, use proprietary syntax, vary considerably across products and, depending on how procedural they are, are difficult to optimize. All products have the same non-trivial issues to solve, summarized by an expert as follows (see chapter 11 in [3] for a more in-depth discussion):

"It is very difficult to
(a) come up with an efficient DBMS implementation--one that does not 'activate' validation of a trigger when, given a transaction, it is not necessary, and one that, in the case it is necessary, to validate only what is minimally necessary and
(b) cater to multiple concurrent transactions.
The vendors address these issues, particularly isolation levels, in fundamentally different ways. In practice and assuming a 'last insert wins' protocol, a disjunctive constraint requires to delete from one table whenever inserted into the other. This is relatively easy in SQL Server, Sybase, and DB2, much harder in Oracle." --Toon Koppelaars
As Toon points out, both correctness and efficiency are paramount. Without them, a solution will not be accepted in the real world, where large and high transaction rate databases are mainstream. This burden is exacerbated by SQL being a highly redundant language -- a query can be expressed in multiple ways -- which makes optimization much more difficult [4,5].
 

Declarative integrity constraints would relieve developers to focus on their logic (rather than on implementation details for which DBMS vendors are in a much better position). Here, for example, is how the disjunctive constraint is expressed in D4, the native data language of the true RDBMS Dataphor [6]:
CREATE CONSTRAINT no_living_dead
 NOT EXISTS (live_people {id} JOIN (dead_people {id}));
Of course, declarative constraints are only as good as how efficient CPU-wise and concurrency-wise the underlying implementation code that actually validates the constraint during transactions is, for which task vendors are better positioned than application developers. This is why the real solution are shorthands similar to those for key and referential constraints--they are more reliable and relieve developers from drudgery. They would also help vendors: they can develop shorthand-specific validation code, which is much easier to do than accept developer formulated ASSERTIONs, parse and deep-analyze them.

Conclusion

Given a true RDBMS that supports constraint inheritance, adherence to the POOD would ensure that all base relations are uniquely constrained, such that each tuple satisfies only one one RP, in which case the DBMS can update all theoretically updatable views necessary for transparent ESS support.

It's because SQL DBMSs are not relational that they have weak declarative integrity support and that, coupled with database designs that do not adhere to the POOD, make view updatability and ESS support impossible. While the poor relational fidelity of SQL was sufficient to render it superior to what preceded it, this is but one example of the many practical benefits of the RDM that failed to materialize. Sadly, data professionals are unfamiliar with relational fundamentals, confuse RDBMSs with SQL DBMSs, do not adhere to sound database design principles and, therefore, are unaware of and unable to appreciate the practical benefits they are missing and to demand them from vendors. Worse, the limitations, deficiencies and complexities due to failure to implement the RDM are blamed, absurdly on the RDM, leading the industry to regress to technologies inferior to even SQL that were effectively displaced by it decades ago.

A Note on Identical Relations

Identical relations are a trivial POOD violation. My post on the subject http://www.dbdebunk.com/2016/09/database-design-relation-predicates-and.html triggered a thread that demonstrated yet again the intellectual poverty in the field, except for an exchange between my colleague Erwin Smout and another reader who gave him a run for the money (recommended reading). Here I would like to respond to Erwin's comments.
"The real meaning of a relation is the intended interpretation it was given when it was designed (and which is also the criterion by which the database user decides whether a tuple should be present or not, thus by which the database user decides which updates to carry through)."
Well, yes and no.
  • Yes: the meaning of a relation is indeed that assigned to it by its designer based on a conceptual model -- a set of informal business rules of several types;
  • No: the criterion by which the DBMS -- not the user/application decides whether a tuple should be inserted in which relation is the relation predicate (RP) -- the conjunction of integrity constraints that formally represents the business rules in the database.
"The DBMS doesn't read your mind before interpreting relations, which is why the real interpretation happens outside the DBMS. It interprets relations according to its own data integrity enforcement rules. The only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state), completely devoid of any form of "interpretation" in the sense of "human interpretation". And which is also why the DBMS needs "help" from its users when deciding whether TUPLE{} represents "The shop is closed" and should go in THE_SHOP_IS_CLOSED rather than "The alarm is set" and should go in THE_ALARM_IS_SET. Hell, it's why those things were called "computers". They compute. Nothing more."
Precisely. So the only way in which this can work is for the RPs to be unique: the relations should be uniquely constrained such that every tuple satisfies exactly one RP, which is another way to say POOD adherence. This is why, together with the fact that not all relations are named, names are only shorthand references to corresponding constraints, that must be substituted for them whenever the DBMS checks tuples for insertion in a relation.

I don't worry about Erwin. But

"What happens when users are not sophisticated enough to "read the mind" of the designer and have either incompatible interpretations or -- just as common -- interpretations that vary by day, mood, focus, etc.? What if two applications have incompatible interpretations?There has to be an objective way to determine -- to the best of our ability -- that the interpretation (semantics that relates to the real world) upon which the database schema depends -- which is what determines logical design -- is the same interpretation used by all users who are authorized (perhaps via applications) to update the database and then enforce that semantic consistency." --David McGoveran
In this context there is irony in Erwin's remark: "Maybe that's just semantics and I'm not very good at those. My DBMS is sufficiently expressive, thank you."


References

[1] Koppelaars, T., CREATE ASSERTION: Impossible Dream? 


[2] Smout, E., CREATE ASSERTION: Neither Impossible Nor a Dream

[3] Koppelaars, T., and de Haan, L., APPLIED MATHEMATICS FOR DATABASE PROFESSIONALS.

[4] Pascal, F., Language Redundancy and DBMS Performance


[5] Fernandez, I., The way you write your query matters.

[6] Dataphor D4.


Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

5 comments:

  1. Why do you agree with the quote that TUPLE{} "should go in THE_ALARM_IS_SET" is a user decision?

    According to the Information Principle aka IP, a projection without attributes can only be derived which means the DBMS can decide if it 'goes in' a relation.

    The point of following the IP is that ignoring it can be used to disprove any logical conclusion made by the dbms which means you might as well not bother with the dbms.

    ReplyDelete
  2. I did not agree with every detail in the comment, but with the general idea that the DBMS does not understand semantically like users, but rather follows statements algorithmically, which is why letting users do that defeats the purpose.

    The IP is there for multiple reasons, not just that.

    ReplyDelete
  3. Regardless of whether Codd watered-down the IP (I believe he didn't even mention it until some years after his first papers), this has been an excellent series. I'd hoped for a fourth part about employees having neither salaries nor commissions. That might help show how traditional normalization practice introduces ambiguous RP's when the motive is only to eliminate update anomalies.

    It also needs to be said that perhaps the most important multi-table updates, join updates, are possible even without POOD. (In his RM V2 book. Codd recognized that join deletes are generally unambigous. At the same time he correctly said to the effect that union insertions are generally ambiguous.)

    Why this is not generally recognized I can only guess is due to a Hans Christian Andersen kind of world where people see things that aren't there, eg., thinking that tuples addressed by a join expression aka predicate include operand tuples that are not sub-tuples of the join, in other words relations with predicates that aren't the predicates of the operand subsets that don't determine the join relation's value.

    (I've finally managed to locate the dbdebunk discussion dating from Sept 2004 that has the comments by David McGoveran which eventually cleared up my own confusion about join deletes http://web.archive.org/web/20041013150837/http://www.dbdebunk.com/page/page/1396086
    )

    ReplyDelete
    Replies
    1. Neither salaries nor commissions loses the definition of employee.

      The advantages of the three design principles are not limited to view updating.

      The link does not get me to the discussion

      Delete
  4. A better link:

    http://web.archive.org/web/20041013150837/http://www.dbdebunk.com/page/page/1396086.htm

    ReplyDelete