Tuesday, December 25, 2012

The Clouding Syndrome (REVISED)

Over the years I got enormous and nasty flak for my arguments about the sorry state of foundation knowledge in the industry, the contamination of academia by it and the increasing deterioration in both.  On one occasion, my claim that the designers of SQL did not really understand the relational model was dismissed as utter nonsense. But if that made me a crank, so were Ted Codd and Chris Date.

My friend Jim Lowden emailed me:
Last month [there] was a hopeless article All Your Database Are Belong to Us about DBMSs [in CACM] so badly mistaken, that it was too dreary even to reply. [The author] managed to get the Closed World Assumption pretty much backwards, if that's possible.

Ah, yes, I remembered Erik Meijer's piece, brought to my attention earlier by another friend, Erik Kaun. I had posted that very botching of the CWA as the Quote of the Week on 8/15/12.

Who is Meijer? From Wikipedia (emphasis mine):
Erik Meijer (born 18 April 1963, CuraƧao) is a Dutch computer scientist who is currently a software architect for Microsoft SQL Server, Visual Studio and the .NET Framework. At Microsoft he heads the Cloud Programmability Team. Before that, he was an associate professor at Utrecht University. He received his Ph.D from Nijmegen University in 1992 ... In 2009, he was the recipient of the Microsoft Outstanding Technical Leadership Award... In 2011 Erik Meijer was appointed part-time professor of Cloud Programming within the Software Engineering Research Group at Delft University of Technology.
So I was not the only one disturbed by the article, that also evoked a critical letter from David McGoveran and Chris Date:
Not the Database World We Know

Communications readers have a right to expect accuracy. Sadly, accuracy is not always what they get. The article All Your Database Are Belong to Us by Erik Meijer (Sept. 2012) contains so many inaccuracies, confusions and errors regarding "the database world", it is difficult to read coherently. The first paragraphs alone contain more egregious misstatements than most entire arcicles or papers. For the record:
  • "The raw physical data model" is categorically not "at the center of the [relational database] universe.
  • "Queries do not "assume intimate details of the data representation (indexes, statistics, metadata)."
  • While database technology) relies on "The Closed World Assumption," this assumption has nothing to do with what the author apparently meant.
  • Every phrase in "Exposing naked data and relying on declarative magic becomes a liability" relies on at least one counterfactual.
  • "Objects should hide their private data representation, exposing it only via well-defined behavioral interfaces." But this is exactly what the relational model does-except (unlike OO) it adopts an interface discipline that makes ad hoc querying and the like possible.
  • "In the realm of [data] modelers, there is no notion of data abstraction." Astoundingly wrong.
  • "[Database technology necessarily involves] a computational model with a limited set of operations." False. Although the (very powerful, well-defined, provably correct) required set of relational operations is small, the sky's the limit on derived relational operations or operations that define abstract data type/domain behavior.
  • The author's unfounded antipathy toward relational databases dominates even his application of CAP: "The problem with SQL databases ... is the assumption that the data ... meets a bunch of consistency constraints that is difficult to maintain in an open ['anything goes'?] distributed world." CAP does not eliminate this requirement; "...the hidden cost of forfeiting [system-enforced] consistency ... is the need [for the programmer] to know the system's invariants." Nor can programmers"... design their systems to be robust ... to inconsistency." Once data inconsistency invades a computationally complete system, it is not even, in general, detectable and all bets are off. Consistency must be enforced, hence constraints. The author seemed to equate detecting abnormal execution with enforcing logical data consistency. No wonder confusion abounds; CAP consistency is single-copy consistency, a subset of what ACID databases provide, yet the Gilbert/Lynch CAP proof relies on linearizability, a more stringent requirement than the serializability ACID databases need or use.
And so on...

Deconstructing the entire article properly would take more time than we care to devote, but the foregoing should suffice co demonstrate its fallaciousness. We hope the author is not teaching these confusions, errors, logical inconsistencies and fallacies.

It is difficult even to believe the article was peer reviewed. Indeed, it is truly distressing it did not demonstrate even minimal understanding of one of the most important contributions to computing: the relational model. We can only deplore Communications'  role in promulgating such a lack of understanding.

C. J. Date, Healdsburg, CA
D. McGoveran, Boulder Creek, CA
Writes Jim: "If Date calls your article a turd, well, you don't want that!" But instead of addressing the poor understanding of database fundamentals raised by the letter, Meijer protests that he did not "criticize the relational model". No, he only alerted to the need for "leaving the ivory tower" and "dealing with a morass of ad hoc extensions [to RDBMSs] and the clean mathematical basis of first-order predicate logic"; the need for developers "to think in terms of (un)ordered multisets"; and "view as complementary computational models that fundamentally address loosely coupled distributed systems".

I will let the reader read his article and judge whether he criticized the relational model or not. What I will argue is that his response makes the criticism in the letter even more poignant. What Meijer calls RDBMSs are not really that and what he refers to as "extensions" are mostly relational violations; he ignores the raison d'etre of the relational model--to avoid the complications without benefit of multiple data models without a complete and sound theoretical foundation.

Given the understanding of data fundamentals by DBMS architects, isn't it much easier to explain the "morass of ad hoc extensions to the clean mathematical basis of first-order predicate logic"?

No comments:

Post a Comment

View My Stats