Tuesday, December 25, 2012

The Clouding Syndrome (REVISED)

Over the years I got enormous and nasty flak for my arguments about the sorry state of foundation knowledge in the industry, the contamination of academia by it and the increasing deterioration in both.  On one occasion, my claim that the designers of SQL did not really understand the relational model was dismissed as utter nonsense. But if that made me a crank, so were Ted Codd and Chris Date.

My friend Jim Lowden emailed me:
Last month [there] was a hopeless article All Your Database Are Belong to Us about DBMSs [in CACM] so badly mistaken, that it was too dreary even to reply. [The author] managed to get the Closed World Assumption pretty much backwards, if that's possible.


Ah, yes, I remembered Erik Meijer's piece, brought to my attention earlier by another friend, Erik Kaun. I had posted that very botching of the CWA as the Quote of the Week on 8/15/12.

Who is Meijer? From Wikipedia (emphasis mine):
Erik Meijer (born 18 April 1963, CuraƧao) is a Dutch computer scientist who is currently a software architect for Microsoft SQL Server, Visual Studio and the .NET Framework. At Microsoft he heads the Cloud Programmability Team. Before that, he was an associate professor at Utrecht University. He received his Ph.D from Nijmegen University in 1992 ... In 2009, he was the recipient of the Microsoft Outstanding Technical Leadership Award... In 2011 Erik Meijer was appointed part-time professor of Cloud Programming within the Software Engineering Research Group at Delft University of Technology.
So I was not the only one disturbed by the article, that also evoked a critical letter from David McGoveran and Chris Date:
Not the Database World We Know

Communications readers have a right to expect accuracy. Sadly, accuracy is not always what they get. The article All Your Database Are Belong to Us by Erik Meijer (Sept. 2012) contains so many inaccuracies, confusions and errors regarding "the database world", it is difficult to read coherently. The first paragraphs alone contain more egregious misstatements than most entire arcicles or papers. For the record:
  • "The raw physical data model" is categorically not "at the center of the [relational database] universe.
  • "Queries do not "assume intimate details of the data representation (indexes, statistics, metadata)."
  • While database technology) relies on "The Closed World Assumption," this assumption has nothing to do with what the author apparently meant.
  • Every phrase in "Exposing naked data and relying on declarative magic becomes a liability" relies on at least one counterfactual.
  • "Objects should hide their private data representation, exposing it only via well-defined behavioral interfaces." But this is exactly what the relational model does-except (unlike OO) it adopts an interface discipline that makes ad hoc querying and the like possible.
  • "In the realm of [data] modelers, there is no notion of data abstraction." Astoundingly wrong.
  • "[Database technology necessarily involves] a computational model with a limited set of operations." False. Although the (very powerful, well-defined, provably correct) required set of relational operations is small, the sky's the limit on derived relational operations or operations that define abstract data type/domain behavior.
  • The author's unfounded antipathy toward relational databases dominates even his application of CAP: "The problem with SQL databases ... is the assumption that the data ... meets a bunch of consistency constraints that is difficult to maintain in an open ['anything goes'?] distributed world." CAP does not eliminate this requirement; "...the hidden cost of forfeiting [system-enforced] consistency ... is the need [for the programmer] to know the system's invariants." Nor can programmers"... design their systems to be robust ... to inconsistency." Once data inconsistency invades a computationally complete system, it is not even, in general, detectable and all bets are off. Consistency must be enforced, hence constraints. The author seemed to equate detecting abnormal execution with enforcing logical data consistency. No wonder confusion abounds; CAP consistency is single-copy consistency, a subset of what ACID databases provide, yet the Gilbert/Lynch CAP proof relies on linearizability, a more stringent requirement than the serializability ACID databases need or use.
And so on...

Deconstructing the entire article properly would take more time than we care to devote, but the foregoing should suffice co demonstrate its fallaciousness. We hope the author is not teaching these confusions, errors, logical inconsistencies and fallacies.

It is difficult even to believe the article was peer reviewed. Indeed, it is truly distressing it did not demonstrate even minimal understanding of one of the most important contributions to computing: the relational model. We can only deplore Communications'  role in promulgating such a lack of understanding.

C. J. Date, Healdsburg, CA
D. McGoveran, Boulder Creek, CA
Writes Jim: "If Date calls your article a turd, well, you don't want that!" But instead of addressing the poor understanding of database fundamentals raised by the letter, Meijer protests that he did not "criticize the relational model". No, he only alerted to the need for "leaving the ivory tower" and "dealing with a morass of ad hoc extensions [to RDBMSs] and the clean mathematical basis of first-order predicate logic"; the need for developers "to think in terms of (un)ordered multisets"; and "view as complementary computational models that fundamentally address loosely coupled distributed systems".

I will let the reader read his article and judge whether he criticized the relational model or not. What I will argue is that his response makes the criticism in the letter even more poignant. What Meijer calls RDBMSs are not really that and what he refers to as "extensions" are mostly relational violations; he ignores the raison d'etre of the relational model--to avoid the complications without benefit of multiple data models without a complete and sound theoretical foundation.

Given the understanding of data fundamentals by DBMS architects, isn't it much easier to explain the "morass of ad hoc extensions to the clean mathematical basis of first-order predicate logic"?


Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

14 comments:

  1. I'm glad I wasn't alone in my utter disbelief at the inaccuracies, unfounded assumptions, and complete and utter bullsh*t in that article. It's one of the worst I've read, and it's from someone for whom I used to have some respect.

    It's really confusing to me, because earlier, Meijer has been somewhat complimentary to the relational model (though usually to its impoverished SQL "shadow"). His articles surrounding Microsoft's LINQ ("Language INtegrated Query") are decent, if I recall, and point in the right direction: to projecting data operations better into programming languages, rather than vice-versa.

    His papers on programming support for SQL and XML are, if I recall, useful (http://research.microsoft.com/en-us/um/people/emeijer/Papers/XS.pdf and http://research.microsoft.com/en-us/um/people/emeijer/Papers/XML2003/xml2003.html).

    I haven't read them in some time, so could be remembering incorrectly, but I thought they presented an approach to a practical problem: having to deal with XML and other data, as well as relational (well, SQL). They seemed to put the data first, and to make the programming language its servant.

    But this article is another entirely silly matter.

    ReplyDelete
  2. His problem is not whether he likes or not the RM, but rather the poor/lack of understanding of it. And he does notaddress that all in his response.

    David McGoveran submitted another ACM article by him which is not better than the previous one and it's the next Laugh/Cry.

    It also turns out that his Wikipedia bio is seriously inflating his record and I wonder who wrote it.

    ReplyDelete
  3. His home page (http://research.microsoft.com/en-us/um/people/emeijer/ErikMeijer.html) contains this section:

    "Press Coverage
    It seems that my work is so interesting that it get frequent press coverage.
    ..."

    ReplyDelete
  4. We have a question in such circumstances in my native country: Did your praisers die?

    ReplyDelete
  5. The truly telling thing is that his 'opinions' dovetail so nicely with the desires of his current employer to sell 'cloud services.' It seems that the cheapest commodity in our current market is 'integrity.' To paraphrase, “Whenever I hear the words 'cloud services', I reach for my calculator.” (To total up how much money will be lost and how many jobs will disappear.)

    ReplyDelete
  6. Many years ago I wrote an article called "Integrity is not only referential", criticizing Borland for claiming that Paradox supported RI, when in reality the support was essentially at the application level. I have a forthcoming article with the same title, but this time it has a purely technical meaning.

    When journalists are criticized for writing what the editors want them to to please the advertisers, they protest that nobody has ever tell them what to write. Well, there is no need: any journalist that wants to be hired, employed and be promoted internalize what they perceive their editors want and provide it.

    It's the same with academics who either become vendors, or work for vendors. I don't think MS needs to tell Meijer what to write.

    ReplyDelete
  7. Meijer and his article is just history repeating itself. Jim Gray and "A call to arms" was just the same kind of thing.

    It's a pattern and all you need to do to recognise it, is open your eyes.

    ReplyDelete
  8. Ah, but that's the core problem: ignorance about the history of the field.

    Scientists/academics who go into business bend. When they do I stop considering them scientists and treat them as vendors.

    ReplyDelete
  9. On another "about ..." page ( http://www.microsoft.com/about/technicalrecognition/erik-meijer.aspx ), there is a description of what seems to be his preferred working method : "Meijer prefers to shape his ideas alongside others, "throwing pies at the wall and seeing which ones stick.""

    You know, and they write this as if they even really take a pride in that ... Microsoft will be happy to have gotten themselves a real scientist.

    ReplyDelete
  10. The current application of the term scientist has no validity whatsoever. I am not convinced that there is understanding today of what science means.

    That explains the pride they can take in him.

    ReplyDelete
  11. I suspect the title of this paper tells us everything we need to know:

    http://www.scribd.com/doc/65880083/Confessions-of-a-Used-Programming-Language-Salesman-by-Erik-Meijer

    ReplyDelete
  12. "When considering the past or the future, dear apprentice,be mindful of the present. If, while considering the past,you become caught in the past, lost in the past, or enslaved by the past, then you have forgotten yourself inthe present. If, while considering the future, you become caught in the future, lost in the future, or enslaved by the future, then you have forgotten yourself in the present. Conversely, when considering the past, if you do not become caught, lost, or enslaved by the past,then you have remained mindful of the present. And if,when considering the future, you do not become caught,lost, or enslaved in the future, then you have remained mindful of the present."

    Deep, deep. After all, when you "sell to the masses" you ought to be "deep", otherwise they'll see through it.

    ReplyDelete
  13. It's a shame the article contains so many misstatements early on about RM and DBMSs because it seem to have distracted from what I feel is his main message. Meijer’s articles are some of the most interesting I’ve read in a while from an RM perspective. If Meijer does not understand RM, or does and is critical of it, then that is most ironic, considering the technology he has helped to create (LINQ to Objects)... But I wonder if anyone here has actually looked at the technology under discussion (hint: it’s not a DBMS product, not the Cloud,...), rather than Meijer’s flawed premises and subsequent defense? I think most people commenting here stopped reading just before the section, "PROGRAMMERS: FIX THE BEHAVIOR, VARY THE REPRESENTATION" which is where for me the article begins to get interesting. I challenge people to reread the article, disregarding the first half and start afresh from, “The problem with SQL databases...” Better still, first read Meijer’s article, ‘The World According to LINQ’. You may, like me, need to actually install C# and write some queries before fully understanding the technology (but you guys are clever so maybe not :) I think it can be used to show what a truly RM language could look like (far better than using SQL to do the same). What do you think about *this*?

    ReplyDelete
  14. RM seems to be the quantum theory of data management: nobody REALLY understands it, but there is a qualitative difference: the former is practically impossible to understand, the latter easy. If the primary designer of SQL did not then what should we expect from others?

    Meijer is purportedly a scientist and professor. When I detect such huge problem with his grasp of RM I stop reading because I am not a language and programming expert and I cannot judge whether in those two areas he is any better. I assume that people who are knowledgeable do not make mistakes of that calibre in one area and are tops in another -- usually know what they dk and don't comment on that.

    I just don't understand how somebody can be so wrong about RM and yet design a truly relational language. The probability that the latter sort of "happens" by chance is nil.

    But given that you went through the effort, why don't you write something about the 1/2 part from a purely relational perspective and I'll consider publishing it.

    ReplyDelete