One of the online exchanges I participated in is Use of constraints on Db to improve data quality - good idea or better in the applications? initiated by the following question:
Can I ask whether people make use of the functionality provided by the database to ensure adequate data quality. Secondly do people apply this retrospectively as seems quite reasonable to me when a data problem is identified and the source data cleaned up - to do so could prevent future errors. There appears to be a tension between this sort of implementation and at least a perception of flexibility as database changes would be required should additional allowable values be required.
This is, of course, a question that foundation knowledge would obviate the need for and the same goes for some of the answers. I wrote frequently about this subject (most recently: Database vs Application Enforced Integrity).
Here is one sensible answers:
Why use an RDBMS if you are not going to use its strengths? DB level constraints come out of the box. Why code application logic that involves shifting data out of the db into a middle tier in order to decide whether it needs to accept or reject a given piece of data because of an RI rule, only to have to shift data I the db at the end of it? DB constraints are highly tuned in the database, a fundamental part of the architecture, tried and tested, robust and not likely to fail. Also, your data store has the rules built into it so if you change your application or allow access from more applications then the same RI rules always apply regardless of the route taken to access the data.
Coded RI in applications cannot be easily demonstrated through models or diagrams. ER models nicely show the presence of relationships (keys) in their context.
You don't need to reinvent wheels that are squarer, more wobbly and less useful that the wheels that are already available.But there are two prerequisites for this:
- The DBMS's data language supports fully and correctly integrity constraints of arbitrary complexity;
- Database professionals are sufficiently proficient in logic and the data language to formulate them and verify their correctness.
RI and keys, PK and FK, are general constraints and are the easier part. The real burden are what we refer to as "arbitrary constraints", (for lack of a better term), which are business- and database-specific. Their support by SQL [products] is weak, usually proprietary and quite procedural [via triggers, see next post], which encourages application constraints.My guess is that not many database professionals are aware of the classification of integrity constraints into general--common to all databases--and, for lack of a better term, "arbitrary". Domain, R-table (key, column) and database (referential or foreign key) constraints fall into the general category and, except for domains, SQL DBMSs usually provide shorthands for them. But most of the integrity burden is due to constraints of arbitrary complexity that represent business rules specific to each business and which are, therefore, database-specific. e.g. "Department X can hire only N employees of skills Y and Z and pay salaries not greater than that of its manager".
In an Appendix to my Business Modeling for Database Design paper I show that the formulation and verification of even the simpler general constraints would not be trivial were it not for SQL implementation of shorthands. Constraints of arbitrary complexity are much more demanding and I would trust neither data language designers nor users lacking proficiency in logic with their correct formulation (agian, see next post). Consider, in this context, an email I recently received from William Sisson:
A while back when we were looking at doing some restructuring of our design we looked at the possibility of making the database cross platform.
The data description language for creating the tables and most of the views proved to be quite straightforward, likewise the primary key, foreign key and check constraints. The main difficulty was of course with the stored procedures and though we didn't go ahead with the cross platform plan it prompted us to remove a lot of code from stored procedures and put the logic in views instead. Advances in recent versions of SQL have made many things that might previously have required SPs possible in SQL expressions - but there were also some things in SPs because we hadn't thought them through clearly enough.
The real stumbling block was the translation of constraints that are expressed in triggers. It is here that the different dialects differ very widely. So from the point of view of the book, I think it might be possible to settle on some kind of "standard" syntax for SQL, but constraints (other than those supported directly by SQL) are a difficulty.
The real "cross platform" layer is of course RM not SQL (I think this ties in with Erwin Smout's comment about people trying to teach converting SQL to relational algebra, rather than the other way round).
I have been reading Richard Snodgrass's book "Developing Time-Oriented Database Applications in SQL"+. The book is based on the SQL-92 standard and I was surprised to discover that the assert statement has been in SQL since then. Snodgrass uses the assert statement to express temporal constraints which I am sure would be very clumsy to express by any other means.
The question is why, more than 20 years after the standard, do we still have no implementation of assertions in any SQL product (or at least none that I know of)? It doesn't even seem as if anybody is working on the issue, Toon Koppelaars comments on his Triggers Considered Harmful, Considered Harmful blog* "As far as I know the problem areas described above [about assert], haven't been fully researched yet by the scientific community: for one I'm unable to find this research,"
An implementation of the assert statement is one of the very few things that might make me consider changing to a different SQL-DBMS.The answer is, of course, that the importance of fundamentals is not appreciated by neither vendors nor data professionals, nor management. If the design of a data language flouts them, it becomes much more difficult to correct or extend it later. So much so that vendors prefer to leave the prohibitive implementation burden to database and application developers and they tolerate it.
More on this in the next (four part) post.
Do you like this post? Please link back to this article by copying one of the codes below.
URL: HTML link code: BB (forum) link code:
To be fair to at least one senior DBMS designer (ex-Oracle), when I pressed him about adding support for SQL ASSERT to a new product he was then designing, he said the bean-counters would never fund it. He claimed that the (limited) support they'd built into Oracle had achieved no significant take-up by customers and that no one in the industry would spend scarce development funds on it ever again. Nor, I gathered, on anything in a similar vein either.
ReplyDeleteI place the blame almost entirely on the customers' programmers. There are far too many of them and they know far too little. Which is why there is far too many of them. It is a vicious circle.
There is plenty of blame to go around.
ReplyDeleteThat's the reason I argue that the core problem is systemic and cultural, not this or that company, or this or that individual.
Programmers are the product of the culture and education system and the system neither educates, demands of, or rewards them for foundation knowledge and actually punishes them for it.
Be that as it may, it is also the case that SQL's initial design makes it very difficult to fix many of its deficiencies and flaws. Otherwise it would not have been that costly to do it, the bean counters would have funded it and in time professionals would have learned to take advantage of it.
Just to clarify, the cross platform issue is not my prime concern.
ReplyDeleteThe main point is that however you decide to implement constraints, be it in the application, in stored procedures or in triggers, you are still going to have to solve many of the problems that a DBMS designer would have to solve to implement assertions if you want to get the constraints right.
Furthermore, because you are having to solve these problems on a case by case basis, rather than generically, there is a far greater chance of making mistakes, especially as your attention is diverted away from the task of solving the logical problem towards the technical details of the solution.
Of course -- check the 4-part next post (in which you are quoted) that argues this very thing.
DeleteImposing the implementation details on developers is not a cost-effective approach for user orgs, but convenient for vendors.
I would point out, however, that the general constraints can be easier to shorthand and the real burden are the arbitrary ones.
It is disheartening as a developer to continually be bombarded by ignorance masquerading as wisdom. If I had a nickel for every time I've heard a developer (or designer, or manager) say, "you have to denormalize for performance" then I would never write another line of code (except for pleasure.) I've been actively coding for almost 30 years and the overwhelming motivation of the whole industry the whole time has been how to avoid the hard work of actually designing something. Because of that successful vendors know they face an unending gravy train. They can keep promising magical solutions that never pan out and the failures that occur can be mined for more magical solutions that never pan out either. Griftology appears to be the only respected science and this is true in Government and Industry.
ReplyDeleteYou cannot expect developers to be wise if they never undergo proper education and provided with no incentives to acquire it.
ReplyDeleteHere's evidence of the acuteness of the problem: when I teach or write I emphasize that I focus on KNOWLEDGE and UNDERSTANDING, as distinct from tool training and experience. Editors usually reject that with "forget understanding, stress success in delivering results and career advancement".
At a conference in Auckland, Jesper Johanssen (then a security evangalist for Microsoft) said (paraphrased) that any integrity constraint implemented in the client is pointless because you can't force the bad guys to use your client.
ReplyDeleteAfter his session I took him to task over this because I disagreed. After a bit of discussion, we agreed that we were both right. :-)
Why implement integrity on the client? Computational complexity. Reducing network load. Timeliness of error message return to the person using the application.
Why implement integrity on the server? Security (which was the focus of Jesper's presentation, thus his comment). Leveraging the features of the backend product. Coping with multiple client applications.
In practice, for every integrity rule the conceptual model identities, there is a decision: on server only, on client only, on both server and client, or (of course) ignore.
I'm afraid I am not convinced.
DeleteIf the practical implications of redundancy and consistency risks are considered, constraints belong in the server.
If it is so common to trade off integrity for performance via denormalization, what makes you think that it will be enforced not only on the server, but on the client too, and consistently? Not to mention what happens when constraints need to change?
I suspect that many of these decisions would go away in the advent of in-memory databases, TRDBMS products and implementations a la transrelational. But I am not naive to believe this will happen.