The following message, posted in my Ask the Experts
section was also posted in the DBA Water Cooler forum, both at
TechTarget SearchDatabase. Another reader responded to it in the forum and I
responded to it briefly there (both replies are replicated below, for
convenience). Both Chris Date and I are responding more extensively here.
KA:
“As any innovation, the relational theory follows what is called 80/20
rule. The 80% of its potential have been realized in not always theoretically
precise, but practical and commercially successful RDBMS implementations.
The remaining 20% is notoriously difficult to achieve, e.g. as
SQL is good enough for most customers, it is hard to expect RDBMS vendors to
replace SQL with a fundamentally more sound query language. We all may agree that ODBMS, XML, or
LazyDBMS to name a few, belong to niche implementations that will never become
as popular as RDBMS. My question is
what should be the strategy for future database systems? And, honestly, the answer I want to hear
is
that we should go back to the drawing board and return with an alternative to
the predicate-based model. To rephrase,
Mr. Codd has done a great and timely contribution, but my fear is that we lock
ourselves in the box and just cannot think out of it.
May I give an example of what I believe is currently missing in
the theory. Mr. Date says, "database design is constraint
definition". I like to agree with
it, but ironically many database systems are designed to use a minimum of
constraints. E.g. in Oracle ERP
applications, the business rules validation is done not in RDBMS, but in
Forms. Arguably, the only two successful
constraint types to date are uniqueness (including primary key) and foreign key
constraints. RDBMS is strong in its
localized coupling of tables. However,
all attempts to sophisticate the constraint definitions have not achieved any
significant acceptance.
My example involves a typical foreign key constraint defined
between col1 of tab1 that depends on col2 of tab2. By definition, any attempt to insert a value
into col1, which
does not exist in col2, will fail. But
that is poor, because I wanted to insert this value into tab2 as well but a bit
later and in a different sequence of actions.
Another issue is that I have established my foreign key constraint based
on certain model of the system; as the system keeps accumulating records from
the real world, it may become obvious that my foreign key constraint is not any
more valid. Generally, my premise is in
fact that a decision on success or failure of any given transaction should not
be fixed once and for all by the initial design, but rather evolve based on
inflow of records and on feedback from decisions made by the system in the
past.
To conclude, I do very much respect the effort you bring in
educating folks on virtues of the relational theory. It helps to narrow the mentioned 20% gap.
But, the long-term future is not RDBMS as
much as it is not the predicate math. Would
a search of something fundamentally different encourage you??
Reader Reply: I'm not Fabian but I'd like to answer.
·
The 80/20 rule that you mention is probably more like
98/2.
·
Database systems do not fully implement constraint
definitions, or indeed other aspects of the relational model, the blame should
be on them, not on the model.
·
The so-called problems you describe in foreign key
insertion ("I wanted to insert this value into tab2 as well but a bit
later and in a different sequence of actions," and "it may become
obvious that my foreign key constraint is not any more valid") suggest to
me not a shortcoming of the relational model, but rather that more thorough
analysis and modeling may be required for your data.
·
Uniqueness and foreign keys aren't the only constraints
…
·
I'm probably completely biased, but I have yet to see a
real world data set that could not be satisfactorily described by the
relational model.
KA: R, thanks for the response. I share your passion
for relational theory, as it is the butter and bread of my professional life.
That was not in my intent to ridicule it in any way. On the contrary, I think
it does well for what it has promised. The problem is that in the real world
there are many data related issues, which do not make sense within the
relational theory domain. A typical issue would be a data model evolution. You
said it right: "I have yet to see a real world data set that could not be
satisfactorily described by the relational model". But how do you change a
data model when the business requirements change? My foreign key example was to
show that potentially any part of data model could be affected by changing
requirements. Relational theory does not support schema evolution. Today I have
to do it manually. Tomorrow I would like the next generation DBMS to verify
schema validity.
Fabian Pascal Responds (short): Most of this is not a
question, but an argument. Some of the problems with it were addressed in the
reply by another reader. Whatever problems remain are quite fundamental and
beyond the scope of this Q&A forum.
My response to the only part that can be considered a
question – the last paragraph – is as follows: Predicate logic (not
math!) has been around for thousands of years and replacing it with something
better is quite a tall order. I have not seen anything (including your
examples) to indicate that there are any fundamental problems with it (the
80/20 is not a scientifically defensible claim). But please be my guest and
search for a replacement by all means, and let us know when you have come up
with something. Until then, we must use what we have that works soundly,
rather than arbitrarily violate the rules of logic.
Extended Response: I would like to see concrete
empirical evidence in support of the following:
·
“80% of relational theory potential have been realized”
(How is this possible when there are no RDBMS (as distinct from
SQL DBMS), let alone “successful implementations”
·
“The remaining 20% is notoriously difficult to achieve”
·
“SQL is good enough for most customers”
·
“There are many data related issues which do not make
sense within the relational theory domain”
With respect to the example “A typical issue would be a data
model evolution”, this is yet another instance of confusing the data model
with the enterprise-specific logical models developed using the
relational data model (see Models, Models
Everywhere, Nor Any Time to Think,
Something to
Call One’s Own).
It is not the data model – here, relational--that needs to
evolve; KA means the logical models – the schemas, if you will--that may
need to be evolved. And, in fact, one of the relational objectives was precisely
to make schemas more flexible and, thus, easier to modify in response to
changes.
What is more, it is because database design is
constraint specification, that proper and complete support of integrity
constraints would enable a true RDBMS to validate schemas (see On Automation of
Normalization Validation). And it is precisely because SQL and its
implementations fail to provide adequate integrity support that SQL DBMSs
cannot do the same.
Chris Date Comments: The SQL situation is rather
bizarre:
·
It cannot do transition constraints, except
procedurally, via triggers, but it can do most others. Trouble is it can do
them in two different ways!
·
It does not make a clear distinction between R-table
constraints and database constraints (see the above mentioned chapter) and it
has a logical error in its support of the former.
·
It does not understand the all important notion of
R-table predicate, even though it does have all the necessary ingredients to do
so, hence its barbaric failure with respect to view updating)
If one agrees that database design is
constraint specification and if the vendors do not provide full support of
constraints, shouldn’t one conclude that they essentially fail to provide true
DBMSs (good question for proponents of XML DBMSs)? Why should we discard a
sound logic with us since Aristotle just because vendors insist on ignoring
it??? Suppose bridges were built in violation of the laws of physics: would
there be a demand to discard physics theories and seek better theories, rather
than build physically sound bridges?
Logic is logic and databases are
databases regardless of what “drives the world”, the Web, or anything else. If
we were to discard logic each time a new technology becomes available, we would
still be in the Middle Ages and have to contend with the Inquisition.
Actually it is the asker who is
locked in a box – that of industry practice – and it is his kind of logic that
allows vendors to get away with unsound products.
Chris Date Responds: I find
myself almost totally out of sympathy with KA's comments.
Ø "80%
of [the] potential [of relational theory has] been realized ... The remaining
20% is notoriously difficult to achieve."
Even if we accept that
80:20 split (I don't), what evidence is there for the claim regarding the
20%? My own suspicion is that it would
have been easier to do all 100% right than to do the (putative) 80% as
grotesquely wrong as it has been done.
Ø "[As]
SQL is good enough for most customers, it is hard to expect RDBMS vendors to
replace SQL with a fundamentally more sound query language."
I agree it's hard to
expect vendors--make that SQL vendors, I don't think there are any RDBMS
vendors as such--to replace SQL, but not for the reason stated. It's hard because of the huge
investment
those vendors have in the status quo.
Ø "[We]
should go back to the drawing board and return with an alternative to the
predicate-based model."
I couldn't disagree
more. See Hugh Darwen's article What a
Database Really Is: Predicates and Propositions, in C. J. Date, Hugh Darwen,
and David McGoveran, RELATIONAL
DATABASE WRITINGS 1994-1997, for an explanation of why I believe the
relational model is right, and rock solid, and will endure--even when (perhaps
especially when) SQL has long faded from human memory.
Ø "May
I give an example of what I believe is currently missing in the theory."
Of course--but he
doesn't!
Ø "[All]
attempts to sophisticate the constraint definitions have not achieved any
significant acceptance."
So what do we
conclude? Even if it's true? Myself, I would conclude that there's an
education problem, and possibly an implementation problem, but not that there's
any problem with the theory as such.
By the way, there clearly is an
education problem. Allow me to quote
from my own three-part article on this very subject, Constraints and
Predicates: A Brief Tutorial[Ed.
Note:
see also my two editorials on this site]:
·
“A quick and admittedly not very scientific survey of a
whole shelf load of database textbooks--37 in all, to be exact, including
essentially all of the best-known ones--reveals the following:
·
Only one book had an entire chapter devoted to
the topic of integrity (and even there I had severe reservations about the
quality of the treatment).
Note: At first glance it looked as if there were
three others that had
a whole chapter on the subject too, but closer examination revealed that one of
those books was using the term to refer to normalization issues solely, while
the other two were using it to refer, not to integrity in its usual sense at
all, but rather to locking and concurrency control issues. Caveat lector!
·
Most of the books examined didn't even mention
integrity in a chapter title at all, and those few that did tended to bundle it
with other topics in what seems to me a very haphazard fashion
("Integrity, Views, Security, and Catalogs" is a typical
example).
I couldn't find a good explanation
or definition of the concept, let alone the kind of emphasis I think the
concept deserves, in any of the books at all.
Ø "My
example involves a typical foreign key constraint defined between col1 of tab1
[and] col2 of tab2. By definition, any
attempt to insert a value into col1 that does not exist in col2 will fail. But that is poor,
because I wanted to insert
this value in tab2 as well but a bit later and in a different sequence of actions."
Well, OK: He just doesn't have a foreign key
constraint! Criticizing a cat because
it isn't a dog is unproductive, to say the least. (As an aside, I demonstrated in the article
already quoted that,
logically, constraint checking should never be deferred.)
Ø "[It]
may become obvious that my foreign constraint is not any more valid."
Well, OK: Some constraints are time-dependent, just as
some data is time-dependent. (This is
not to say that dealing with time dependencies is easy.)
Ø "Would
a search [for] something fundamentally different [from relational theory]
encourage you?"
No. Relational theory provides a necessary and
sufficient framework
in which to investigate all kinds of interesting database problems (e.g., the
problem alluded to above of how to deal adequately with the time
dimension). Furthermore, it's founded
on predicate logic, elements of which go back some 2500 years (to
Aristotle). And much of mathematics is
based on it, too. While I'm not saying
predicate logic will never be replaced by something better, I certainly am
saying it's going to be a pretty tall order to come up with that
"something better."
Posted
10/25/02
[ABOUT]
[QUOTES]
[LINKS]