Sunday, July 30, 2017

Integrity Is Not Only Referential: DBMS vs Application Enforced Constraints

Note: This is 07/30/17 rewrite of a 11/11/12 post to bring it line with McGoveran formal exposition of Codd's real RDM [1] and its interpretation.


There is nothing wrong per se with the question in last week's picture, namely:
"Can I ask whether people make use of the functionality provided by the database to ensure adequate data quality. Secondly do people apply this retrospectively as seems quite reasonable to me when a data problem is identified and the source data cleaned up--to do so could prevent future errors. There appears to be a tension between this sort of implementation and at least a perception of flexibility as database changes would be required should additional allowable values be required."
except that it's about time such questions are no longer asked. Unfortunately, they are evidence of the persistent lack of foundation knowledge in the industry for more than five decades. Such knowledge would have obviated such questions.

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.

While one response was sensible, principle-wise:
"Why use an RDBMS if you are not going to use its strengths? DB level constraints come out of the box. Why code application logic that involves shifting data out of the db into a middle tier in order to decide whether it needs to accept or reject a given piece of data because of an RI rule, only to have to shift data in the db at the end of it? DB constraints are highly tuned in the database, a fundamental part of the architecture, tried and tested, robust and not likely to fail. Also, your data store has the rules built into it so if you change your application or allow access from more applications then the same RI rules always apply regardless of the route taken to access the data. Coded RI in applications cannot be easily demonstrated through models or diagrams. ER models nicely show the presence of relationships (keys) in their context. You don't need to reinvent wheels that are squarer, more wobbly and less useful that the wheels that are already available."
it underestimated the damaging limitations caused by existing products and practices. Knowledge of relational constraints is poor and there are no genuine RDBMSs and relationally complete data languages that support them. These two deficiencies reinforce each other and inhibit progress.

Integrity Is Not Only Referential

Many years ago I wrote an article with this title that carried a dual meaning. The first alluded to vendors claiming relational features--in that case Borland's Paradox referred to a form--based (i.e., application-based) feature as referential integrity. Vendors get away with it because of the second, literal meaning: data professionals may be familiar with uniqueness (PK) and referential (FK) constraints only because they are the only two types supported with out of the box shorthands by SQL DBMSs, but are unaware of the several other types of relational constraints[2] and their lack of support in SQL. Some are  supported via stored procedures (rather than declaratively), or not at all, with loss of relational advantages. Even if they were supported, they require proficiency in formalization of conceptual business rules as logical constraints and validation[3] -- that most practitioners don't possess. Here's but one consequence:
"A while back we were looking at doing some restructuring of our design we looked at the possibility of making the database cross platform ... The real stumbling block was the translation of constraints that are expressed in triggers. It is here that the different dialects differ very widely ... it might be possible to settle on some kind of "standard" syntax for SQL, but constraints (other than those supported directly by SQL) are a difficulty." --William Sisson
The problem, of course, is hardly just syntactic. The data sub-language should be neither less expressively powerful than the relational algebra (RA), nor more powerful than first order predicate logic (FOPL):
  • If less, it won't express all constraints;
  • If more, the higher logic will defeat declarativity, decidability and physical independence (PI)--the core advantages of the RDM.
i.e., it should be relationally, rather than computationally complete. No SQL -- let alone NoSQL -- DBMS comes with such a language and none is a true RDBMS. Which answers the question:
"I was surprised to discover that the ASSERT statement has been in the SQL standard since SQL-92. The question is why, more than 20 years after the standard do we still have no implementation of ASSERTIONs in any SQL product (or at least none that I know of)?" --Erwin Smout
As Toon Koppelaars points out, correct and efficient support of constraints is hard even for true RDBMSs, but is a lost cause if the DBMS is not relational, and he should know. That is why
"It doesn't even seem as if anybody is working on the issue "the problem areas [about ASSERTION], haven't been fully researched yet by the scientific community: for one, I'm unable to find this research." --Toon Koppelaars,Triggers Considered Harmful, Considered Harmful
The authors of data languages -- not necessarily data sub-languages -- have long flouted relational and sound language design principles, which is why any post-hoc attempts to improve relational fidelity either require drastic--and disruptive--surgery, or prove unfeasible(just ask IBM about its failed effort to provide a SQL interface to IMS -- if you find anybody who remembers, that is -- the industry does not learn from experience and keeps reinventing square wheels). It took a long time for vendors to add post-hoc even the few basic constraints they do support and their implementations are often complex, inefficient, or limited.

Constraint Inheritance 

A genuine RDBMS is capable of inferring the constraints on any derived relation -- view, snapshot, or query result -- from (1) the constraints on the base relations from which they all ultimately derive -- possibly via other derived relations -- that are recorded in the system catalog and (2) the RA operations deriving them. In other words, RDBMS support of constraint inheritance[4] that guarantees full database consistency, something which no non-relational -- including SQL -- DBMS does. Non-SQL products gave up on integrity enforcement and consistency altogether -- it's just unfeasible.

Under these circumstances, application-enforced integrity is a lost cause and a fool's errand.


[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F, To Really Understand Integrity, Don't Start with SQL

[3] Pascal, F., Appendix A, Business Modeling for Database Design.

[4] Pascal, F., The Principle of Orthogonal Database Design Parts I, II, III

No comments:

Post a Comment

View My Stats