Sunday, November 11, 2012

DBMS- vs. Application-Enforced Integrity

This is 07/30/17 rewrite of a 01/01/13 post to bring it line with McGoveran formal exposition of Codd's real RDM[1] and its interpretation.

There is nothing wrong per se with the question in last week's picture, namely:

"Can I ask whether people make use of the functionality provided by the database to ensure adequate data quality. Secondly do people apply this retrospectively as seems quite reasonable to me when a data problem is identified and the source data cleaned up--to do so could prevent future errors. There appears to be a tension between this sort of implementation and at least a perception of flexibility as database changes would be required should additional allowable values be required." --LinkedIn.com
except that it's about time such questions are no longer asked. Unfortunately, they are evidence of the persistent lack of foundation knowledge in the industry for more than five decades. Such knowledge would have obviated such questions.
While one response was sensible, principle-wise:
"Why use an RDBMS if you are not going to use its strengths? DB level constraints come out of the box. Why code application logic that involves shifting data out of the db into a middle tier in order to decide whether it needs to accept or reject a given piece of data because of an RI rule, only to have to shift data in the db at the end of it? DB constraints are highly tuned in the database, a fundamental part of the architecture, tried and tested, robust and not likely to fail. Also, your data store has the rules built into it so if you change your application or allow access from more applications then the same RI rules always apply regardless of the route taken to access the data. Coded RI in applications cannot be easily demonstrated through models or diagrams. ER models nicely show the presence of relationships (keys) in their context. You don't need to reinvent wheels that are squarer, more wobbly and less useful that the wheels that are already available."
it underestimated the damaging limitations caused by existing products and practices. Knowledge of relational constraints is poor and there are no genuine RDBMSs and relationally complete data languages that support them. These two deficiencies reinforce each other.


Integrity Is Not Only Referential


Many years ago I wrote an article with this title that carried a dual meaning. The first alludes to vendors claiming relational features--in that case, application- rather than DBMS-enforced FK constraints. They get away with it because of the second, literal meaning: few data professionals are aware of the several categories of relational constraints[2] and, therefore, of their lack of support in SQL and the consequences thereof. Out of those, only two can be considered "out of the box" in SQL: shorthands for uniqueness (PK) and referential (FK) constraints--with which there is some familiarity. The rest are either unsupported declaratively, or supported via stored procedures outside the RDM, or not at all, with loss of relational advantages. Even if they were supported, they require proficiency in formalization of conceptual business rules as logical constraints and validation[3]--that most practitioners don't possess. Here's but one consequence:
"A while back we were looking at doing some restructuring of our design we looked at the possibility of making the database cross platform ... The real stumbling block was the translation of constraints that are expressed in triggers. It is here that the different dialects differ very widely ... it might be possible to settle on some kind of "standard" syntax for SQL, but constraints (other than those supported directly by SQL) are a difficulty." --William Sisson
The problem, of course, is hardly just syntactic. The data language should be neither less expressively powerful than the relational algebra (RA), nor more powerful than first order predicate logic (FOPL):
  • If less, it won't express all constraints;
  • If more, the higher logic will defeat declarativity, decidability and physical independence (PI)--the core advantages of the RDM.
i.e., it should be relationally, rather than computationally complete. No SQL--let alone NoSQL--DBMS comes with such a language and none is a true RDBMS. Which answers the question:
"I was surprised to discover that the ASSERT statement has been in the SQL standard since SQL-92. The question is why, more than 20 years after the standard do we still have no implementation of ASSERTIONs in any SQL product (or at least none that I know of)?" --Erwin Smout
As Toon Koppelaars points out, correct and efficient support of constraints is hard even for true RDBMSs, but is a lost cause if the DBMS is not relational, and he should know.  That is why
"It doesn't even seem as if anybody is working on the issue "the problem areas [about ASSERTION], haven't been fully researched yet by the scientific community: for one, I'm unable to find this research." --Toon Koppelaars,Triggers Considered Harmful, Considered Harmful
The authors of data languages have long flouted relational and sound language design principles, which is why any post-hoc attempts to improve relational fidelity either require drastic--and disruptive--surgery, or prove unfeasible. It took a long time for vendors to add post-hoc even the few basic constraints they do support and their implementation is often complex, inefficient, or limited.


Constraint Inheritance 


A genuine RDBMS is capable of inferring the constraints on any derived relation--view, snapshot, or query result--from (1) the constraints on the base relations from which they all ultimately derive--possibly via other derived relations--that are recorded in the system catalog and (2) the RA operations deriving them. In other words, RDBMS support of constraint inheritance[4] that guarantees full database consistency, something which no non-relational--including SQL--DBMS does. Non-SQL products gave up on integrity enforcement and consistency altogether--it's just unfeasible (just ask IBM about its attempt to provide a SQL interface to IMS--if you find anybody who remembers, that is--the industry does not learn from experience and keeps reinventing square wheels).

Under these circumstances, application-enforced integrity is a lost cause and a fool's errand.

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[2] Pascal, F, To Really Understand Integrity, Don't Start with SQL

[3] Pascal, F., Appendix XX, Business Modeling for Database Design.

[4] Pascal, F., The Principle of Orthogonal Database Design Parts I, II, III



Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment