Saturday, October 26, 2019

Data Sublanguage Part 4: Conclusion

In Parts 1, 2, and 3  we showed that when the RDM is the data model:
  • A data sublanguage is short for data manipulation language (DML) that combines (1) a relationally complete retrieval component (i.e., that expresses the RA) with (2) a component that expresses updates as relation transformations;
  • A DBMS language is a careful combination, for practical purposes, of the data sublanguage with several sublanguages, each of which expresses a data management function (e.g., data definition, transactions, concurrency, authorizations) -- that are not relational, but are consistent with the RDM, and must not include syntactic elements that are at odds with, or subvert those of the DML.
Note: The RDM is the only data model consistent with Codd's definition that has been formalized [1].

We are now in a position to debunk the two quotes that triggered this series.



Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you. 


  • 10/26/19: The POSTS page now links to all 2012-2018 posts (to be updated annually at year-end). Except for 2017, the (italicized) links are to abstracts of my columns @All Analytics site, which was discontinued (see below).
  • 10/26/19: Updated and cleaned up the WRITINGS page.
  • 08/09/19: Following my series of posts on data sublanguage (Parts 1-4), I have revised for consistency the corresponding section of paper #2 in the Understanding the Real RDM series, Logical Access, Data Sublanguage, Kinds of Relations, and Database Redundancy and Consistency, which is available for ordering from the PAPERS page.



  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics site, the links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.


I deleted my Facebook account. You can follow me:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”
We have shown that to accommodate sets of sets (RVDs in the RDM), Codd started in 1969 with axiomatic set theory (AST) -- of which Zermelo-Fraenkel (ZF) is one version -- expressible in second order logic (SOL). ZF + Axiom of Choice (let's call it ZFC) is believed to suffice as a foundation for arithmetic, but the axioms that are necessary for this to be possible introduce consistency issues: ZF (and most ASTs') requirement of sets of sets means that they have no corresponding first order logic -- and, thus, include certain paradoxes that render them undecidable -- for which there is no "fix". In other words, ZF is not superior to SST/FOPL as foundation for the RDM, which is why in 1970 Codd switched from the former to the latter, which requires proper sets that do not have sets as members (relations in normal form in RDM).

This had nothing to do with Childs' "extended set theory" (XST) published in 1977[2] -- eight years after the RDM -- his research was still evolving while Codd's was well advanced. Sherman and Bloor confused Codd's reference to Childs' 1968 paper[2] -- which was just a scholarly citation of someone who had already published about using set theory (not ZF) to formally represent data before he did -- with the 1977 XST co-written paper[3]. Codd and Childs had different approaches, but both promoted a strong theoretical foundation for data management.

“In 1968 Childs was the first who proposed to use set theory for data representation (NOT XST which he developed later, so it did not exist when Codd introduced the RDM). Neither did Childs introduce the special version of sets called relations, so Codd relied only on his appeal to set theory in very general terms -- SST not XST. As I have said many times, axiomatic set theories (ASTs) like ZF or ZFC are more powerful than FOL and are undecidable (so inappropriate for relational). The na├»ve approach to set theory (SST) is a small subset of certain formal set theories for which there are first order expressions of the concepts (the usual set operations, the notion of non-set element, universe, set membership, and set containment). This is all Codd relied upon -- the rest is predicate logic. Sherman and Bloor notwithstanding, ZF would have been a disaster, "sets of sets" and all.”
--David McGoveran
It is SST/FOPL that is responsible for the core advantages of the RDM and, thus, superior to the various flavors of AST (ZF included) as a foundation, and it is precisely what differentiates a data sublanguage from programming languages (there's nothing so far to suggest that Childs' XST avoids AST/SOL problems the way SST/FOL does). 
“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”
While SQL was intended as a data sublanguage and is relationally complete (i.e., expresses the RA), its designers did not have a sufficient grasp of the RDM[4]. It was not designed per the principles conveyed in this series, and the various commercial dialects do not adhere to them. There is plenty of evidence, however, of violations, on which we have written extensively and will not reiterate here (just do a site search on SQL). We have not tracked the numerous changes to these dialects that, oblivious to the sublanguage-programming boundary, subvert the RDM. 

With the correct understanding of the difference between a data sublanguage, a DBMS language, and a programming language we provide in this series, we have equipped you to assess for yourself whether the SQL dialect you employ is a proper data sublanguage or not, detect violations of the RDM, and to appreciate the implications thereof. 

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Codd, E.F., Data Models in Database Management.

[2] Childs, D.L., Feasibility of a set-theoretical data structure - a general structure based on a reconstituted definition of relation.

[3] Blass, A., and Childs, D.L., Axioms and Models for An Extended Set Theory.

[4] Darwen, H., Why Are There No Relational DBMSs.

No comments:

Post a Comment