ON BEING ILLOGICAL ABOUT LOGIC
with Fabian Pascal and Chris Date
Type Your Title Here

 

 

 

The following message, posted in my Ask the Experts section was also posted in the DBA Water Cooler forum, both at TechTarget SearchDatabase. Another reader responded to it in the forum and I responded to it briefly there (both replies are replicated below, for convenience). Both Chris Date and I are responding more extensively here.

 

 

KA: “As any innovation, the relational theory follows what is called 80/20 rule. The 80% of its potential have been realized in not always theoretically precise, but practical and commercially successful RDBMS implementations.

 

The remaining 20% is notoriously difficult to achieve, e.g. as SQL is good enough for most customers, it is hard to expect RDBMS vendors to replace SQL with a fundamentally more sound query language.  We all may agree that ODBMS, XML, or LazyDBMS to name a few, belong to niche implementations that will never become as popular as RDBMS.  My question is what should be the strategy for future database systems?  And, honestly, the answer I want to hear is that we should go back to the drawing board and return with an alternative to the predicate-based model.  To rephrase, Mr. Codd has done a great and timely contribution, but my fear is that we lock ourselves in the box and just cannot think out of it.

 

May I give an example of what I believe is currently missing in the theory. Mr. Date says, "database design is constraint definition".  I like to agree with it, but ironically many database systems are designed to use a minimum of constraints.  E.g. in Oracle ERP applications, the business rules validation is done not in RDBMS, but in Forms.  Arguably, the only two successful constraint types to date are uniqueness (including primary key) and foreign key constraints.  RDBMS is strong in its localized coupling of tables.  However, all attempts to sophisticate the constraint definitions have not achieved any significant acceptance.

 

My example involves a typical foreign key constraint defined between col1 of tab1 that depends on col2 of tab2.  By definition, any attempt to insert a value into col1, which does not exist in col2, will fail.  But that is poor, because I wanted to insert this value into tab2 as well but a bit later and in a different sequence of actions.  Another issue is that I have established my foreign key constraint based on certain model of the system; as the system keeps accumulating records from the real world, it may become obvious that my foreign key constraint is not any more valid.  Generally, my premise is in fact that a decision on success or failure of any given transaction should not be fixed once and for all by the initial design, but rather evolve based on inflow of records and on feedback from decisions made by the system in the past.

 

To conclude, I do very much respect the effort you bring in educating folks on virtues of the relational theory.  It helps to narrow the mentioned 20% gap.  But, the long-term future is not RDBMS as much as it is not the predicate math.  Would a search of something fundamentally different encourage you??

 

Reader Reply: I'm not Fabian but I'd like to answer.

 

·   The 80/20 rule that you mention is probably more like 98/2.

·   Database systems do not fully implement constraint definitions, or indeed other aspects of the relational model, the blame should be on them, not on the model.

·   The so-called problems you describe in foreign key insertion ("I wanted to insert this value into tab2 as well but a bit later and in a different sequence of actions," and "it may become obvious that my foreign key constraint is not any more valid") suggest to me not a shortcoming of the relational model, but rather that more thorough analysis and modeling may be required for your data.

·   Uniqueness and foreign keys aren't the only constraints …

·   I'm probably completely biased, but I have yet to see a real world data set that could not be satisfactorily described by the relational model.

 

KA: R, thanks for the response. I share your passion for relational theory, as it is the butter and bread of my professional life. That was not in my intent to ridicule it in any way. On the contrary, I think it does well for what it has promised. The problem is that in the real world there are many data related issues, which do not make sense within the relational theory domain. A typical issue would be a data model evolution. You said it right: "I have yet to see a real world data set that could not be satisfactorily described by the relational model". But how do you change a data model when the business requirements change? My foreign key example was to show that potentially any part of data model could be affected by changing requirements. Relational theory does not support schema evolution. Today I have to do it manually. Tomorrow I would like the next generation DBMS to verify schema validity.

 

 

Fabian Pascal Responds (short): Most of this is not a question, but an argument. Some of the problems with it were addressed in the reply by another reader. Whatever problems remain are quite fundamental and beyond the scope of this Q&A forum.

 

My response to the only part that can be considered a question – the last paragraph – is as follows: Predicate logic (not math!) has been around for thousands of years and replacing it with something better is quite a tall order. I have not seen anything (including your examples) to indicate that there are any fundamental problems with it (the 80/20 is not a scientifically defensible claim). But please be my guest and search for a replacement by all means, and let us know when you have come up with something. Until then, we must use what we have that works soundly, rather than arbitrarily violate the rules of logic.

 

Extended Response: I would like to see concrete empirical evidence in support of the following:

 

·   “80% of relational theory potential have been realized” (How is this possible when there are no RDBMS (as distinct from SQL DBMS), let alone “successful implementations”

·   “The remaining 20% is notoriously difficult to achieve”

·   “SQL is good enough for most customers”

·   “There are many data related issues which do not make sense within the relational theory domain”

 

With respect to the example “A typical issue would be a data model evolution”, this is yet another instance of confusing the data model with the enterprise-specific logical models developed using the relational data model (see Models, Models Everywhere, Nor Any Time to Think, Something to Call One’s Own).

 

It is not the data model – here, relational--that needs to evolve; KA means the logical models – the schemas, if you will--that may need to be evolved. And, in fact, one of the relational objectives was precisely to make schemas more flexible and, thus, easier to modify in response to changes.

 

What is more, it is because database design is constraint specification, that proper and complete support of integrity constraints would enable a true RDBMS to validate schemas (see On Automation of Normalization Validation). And it is precisely because SQL and its implementations fail to provide adequate integrity support that SQL DBMSs cannot do the same.

 

Chris Date Comments: The SQL situation is rather bizarre:

 

·   It cannot do transition constraints, except procedurally, via triggers, but it can do most others. Trouble is it can do them in two different ways!

·   It does not make a clear distinction between R-table constraints and database constraints (see the above mentioned chapter) and it has a logical error in its support of the former.

·   It does not understand the all important notion of R-table predicate, even though it does have all the necessary ingredients to do so, hence its barbaric failure with respect to view updating)

 

If one agrees that database design is constraint specification and if the vendors do not provide full support of constraints, shouldn’t one conclude that they essentially fail to provide true DBMSs (good question for proponents of XML DBMSs)? Why should we discard a sound logic with us since Aristotle just because vendors insist on ignoring it??? Suppose bridges were built in violation of the laws of physics: would there be a demand to discard physics theories and seek better theories, rather than build physically sound bridges?

 

Logic is logic and databases are databases regardless of what “drives the world”, the Web, or anything else. If we were to discard logic each time a new technology becomes available, we would still be in the Middle Ages and have to contend with the Inquisition.

 

Actually it is the asker who is locked in a box – that of industry practice – and it is his kind of logic that allows vendors to get away with unsound products.

 

Chris Date Responds: I find myself almost totally out of sympathy with KA's comments. 

 

Ø       "80% of [the] potential [of relational theory has] been realized ... The remaining 20% is notoriously difficult to achieve." 

 

Even if we accept that 80:20 split (I don't), what evidence is there for the claim regarding the 20%?  My own suspicion is that it would have been easier to do all 100% right than to do the (putative) 80% as grotesquely wrong as it has been done.

 

Ø       "[As] SQL is good enough for most customers, it is hard to expect RDBMS vendors to replace SQL with a fundamentally more sound query language."

 

I agree it's hard to expect vendors--make that SQL vendors, I don't think there are any RDBMS vendors as such--to replace SQL, but not for the reason stated.  It's hard because of the huge investment those vendors have in the status quo.

 

Ø       "[We] should go back to the drawing board and return with an alternative to the predicate-based model." 

 

I couldn't disagree more.  See Hugh Darwen's article What a Database Really Is: Predicates and Propositions, in C. J. Date, Hugh Darwen, and David McGoveran, RELATIONAL DATABASE WRITINGS 1994-1997, for an explanation of why I believe the relational model is right, and rock solid, and will endure--even when (perhaps especially when) SQL has long faded from human memory.

 

Ø       "May I give an example of what I believe is currently missing in the theory."

 

Of course--but he doesn't!

 

Ø       "[All] attempts to sophisticate the constraint definitions have not achieved any significant acceptance." 

 

So what do we conclude?  Even if it's true?  Myself, I would conclude that there's an education problem, and possibly an implementation problem, but not that there's any problem with the theory as such. 

 

By the way, there clearly is an education problem.  Allow me to quote from my own three-part article on this very subject, Constraints and Predicates: A Brief Tutorial[Ed.

Note: see also my two editorials on this site]:

 

·   “A quick and admittedly not very scientific survey of a whole shelf load of database textbooks--37 in all, to be exact, including essentially all of the best-known ones--reveals the following:

·   Only one book had an entire chapter devoted to the topic of integrity (and even there I had severe reservations about the quality of the treatment).

 

Note:  At first glance it looked as if there were three others that had a whole chapter on the subject too, but closer examination revealed that one of those books was using the term to refer to normalization issues solely, while the other two were using it to refer, not to integrity in its usual sense at all, but rather to locking and concurrency control issues.  Caveat lector!

 

·   Most of the books examined didn't even mention integrity in a chapter title at all, and those few that did tended to bundle it with other topics in what seems to me a very haphazard fashion ("Integrity, Views, Security, and Catalogs" is a typical example). 

 

I couldn't find a good explanation or definition of the concept, let alone the kind of emphasis I think the concept deserves, in any of the books at all.

 

Ø       "My example involves a typical foreign key constraint defined between col1 of tab1 [and] col2 of tab2.  By definition, any attempt to insert a value into col1 that does not exist in col2 will fail.  But that is poor, because I wanted to insert this value in tab2 as well but a bit later and in a different sequence of actions."

 

Well, OK:  He just doesn't have a foreign key constraint!  Criticizing a cat because it isn't a dog is unproductive, to say the least.  (As an aside, I demonstrated in the article already quoted that, logically, constraint checking should never be deferred.)

 

Ø       "[It] may become obvious that my foreign constraint is not any more valid."

 

Well, OK:  Some constraints are time-dependent, just as some data is time-dependent.  (This is not to say that dealing with time dependencies is easy.)

 

Ø       "Would a search [for] something fundamentally different [from relational theory] encourage you?" 

 

No.  Relational theory provides a necessary and sufficient framework in which to investigate all kinds of interesting database problems (e.g., the problem alluded to above of how to deal adequately with the time dimension).  Furthermore, it's founded on predicate logic, elements of which go back some 2500 years (to Aristotle).  And much of mathematics is based on it, too.  While I'm not saying predicate logic will never be replaced by something better, I certainly am saying it's going to be a pretty tall order to come up with that "something better."

 

 

Posted 10/25/02

 

 

 

[ABOUT] [QUOTES] [LINKS]