Sunday, January 27, 2013

Database Design, Relational Fidelity and SQL Part III: Constraints, Triggers and View Updating

In Part II I discussed two possible database designs for a business model consisting of supertype and subtype entity classes: one proposed by WB and consisting of two base tables, two views and a referential (foreign key) constraint, and an alternative, consisting of two different base tables, a disjunctive (mutually exclusive) constraint and one view. Even though the second design seems in my subjective perception more faithful to the business model than the first, in my book PRACTICAL ISSUES IN DATABASE MANAGEMENT I recommended the first. Why?

As you probably know, referential constraints are supported by practically all SQL products with a shorthand. You don't have to bother formulating even the relatively simple SQL standard ASSERTION I specified in Part II. All you have to do is add a REFERENCES clause to the CREATE TABLE statement and the DBMS takes care of the rest.

In contrast, even though in standard SQL the disjunctive ASSERTION is pretty similar to the referential one, SQL products enforce the former via triggered procedures. They are better than application enforced constraints, but unlike shorthands, they leave the burden of implementation details--which tend to be complex, use proprietary syntax, implementations vary considerably across products and, depending on how procedural they are, are difficult to optimize--to developers.

In Chapter 11 of their book, APPLIED MATHEMATICS FOR DATABASE PROFESSIONALS (recommended and available via this site), Toon Koppelaars and Lex de Haan detail what it takes to implement constraints via triggers. All products have exactly the same non-trivial issues to solve, which Toon summarizes as follows:
It is very difficult to (a) come up with an efficient DBMS implementation--one that does not 'activate' validation of a trigger when, given a transaction, it is not necessary, and one that, in the case it is necessary, to validate only what is minimally necessary and (b) cater to multiple concurrent transactions. The vendors address these issues, particularly isolation levels, in fundamentally different ways.
In practice and assuming a 'last insert wins' protocol, the disjunctive constraint requires to delete from one table whenever inserted into the other. This is relatively easy in SQL Server, Sybase, and DB2, much harder in Oracle.

I asked Will Sisson, Toon Koppelaars, Craig Mullins and Nathan Allan, specialists in SQL Server, Oracle, DB2 and Dataphor, respectively, to comment on the trigger implementation of the disjunctive constraint and they kindly offered their thoughts and even some sample code.
  • SQL Server (2 triggers): 
You need to have a lot of knowledge about how a particular DBMS works to implement them correctly. If I wanted to be really sure about my trigger implementation of the constraint from a concurrency point of view, I would have to work out every possible set of history sequences that could occur and whether any of them could violate the integrity of the database. I agree absolutely that DBMS vendors are better positioned to have the DBMS relieve the developers from this error prone burden.
The use of the INSERTED and DELETED system-generated temporary tables allows the constraint to be restricted only to the set of rows being inserted, updated or deleted - after that it is just another SQL expression and is evaluated by the optimizer just like any other expression.
  • Oracle (4 very long and complex compound triggers): 
Since Oracle does not provide an "inserted" table that holds just the rows to be inserted (or new versions of the rows in case they are updated), we need row-level triggers and a temporary table to maintain ourselves. We also need a separate procedure to cater to acquisition of predicate locks. Given Oracle's default 'snapshot isolation', one has to introduce predicate locks and acquire them at the appropriate time inside the transaction, to ensure that the queries that execute as part of the validation of a constraint can be sure to 'conclude' it is OK all the way until the end of the transaction (when it commits).
  • DB2 (4 triggers): Judging from the sample code, it falls somewhere between SQL Server and Oracle, but closer to the former.
Note that, as Toon points out, both correctness and efficiency are paramount. Without them, a solution will not be accepted in the real world, where large and high transaction rate databases are mainstream. This burden is exacerbated by SQL being a highly redundant language--the same query can be expressed in multiple ways--which makes optimization much more difficult (as I demonstrated in an 80's article, Language Redundancy and DBMS Performance; see The way you write your query matters; that's one reason Oracle does not optimize triggers).

Declarative ASSERTIONs would allow developers to focus on constraint logic, rather than on implementation details, which would be expressed more declaratively. Here, for example, is how the disjunctive constraint is expressed in Dataphor, a truly relational DBMS whose native data language is not SQL:
CREATE CONSTRAINT no_living_dead
 NOT EXISTS (dead_people {person_id} JOIN (live_people {person_id}));
Of course, Toon reminds us, declarative assertions are only as good as how efficient CPU-wise and concurrency-wise the underlying implementation code that actually validates the constraint during transactions is.

This is why the real solution are shorthands, which relieves developers from the implementation burden. They would also help vendors: they can develop shorthand-specific validation code, which is much easier to do than accept developer formulated ASSERTIONs and parse and deep-analyse them.

We have seen that both design options use views,

the first to represent the 2 subtype classes, the second to represent the supertype class. Here are the SQL view definitions of  the first design:
CREATE VIEW live_people
AS SELECT person_id, name, birth
   FROM people
               (SELECT *
                FROM people, deaths
                WHERE people.person_id = deaths.person_id);
CREATE VIEW dead_people
AS SELECT people.person_id,, people.birth, deaths.death
   FROM people, deaths
                 FROM people, deaths
                 WHERE people.person_id = deaths.person_id);
and here's the one of the second design:
AS SELECT person_id, name
   FROM live_people
   SELECT person_id, name
   FROM dead_pople;
In fact, to benefit  from logical data independence applications should access a database only through views. A TRDBMS that supports correctly the relational model i.e., "understands" predicates) would be able to update these views, but due to backward compatibility with various violations of the model, SQL DBMS's cannot guarantee propagation of views updates  to the underlying base tables, so the views in either model are not updatable, defeating one of the core practical advantages of the model.

Note: There is some genuine controversy regarding view updating (see my exchange with Erwin Smout).

To recall, this series was triggered by WB's inquiry into logical design that avoids NULLs. The first three posts covered the implications of avoidance of 'inapplicable' NULLs and assumed no missing data.

In Part IV David McGoveran will discuss the correct relational solution to missing data--outlined in my The Final NULL in the Coffin paper--and another burden on developers due to its lack of DBMS support.

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

No comments:

Post a Comment