Friday, March 19, 2021

Data Sublanguages vs. Programming Languages

Revised 3/20/21

I recently came across a review of Edsger Dijkstra's work, where the following comment on a book he co-authored (referred to as D&S) raised my debunking antennae:

“... in general computer people seem to have a penchant for whipping up homebrew logics ... D&S is not the only example ... See E.F. Codd’s Relational Calculus, an obvious mess.”
--Maarten van Emden, A Bridge too Far: E.W. Dijkstra and Logic 

Having recently argued that "Codd was wrong" and "You're teaching [his] gospel" Betray Lack of Foundation Knowledge, my suspicion should hardly surprise. Besides, criticism of Dijkstra is a very tall order in itself, particularly in the context of disputes among logicians). As a reader asked, "What’s so obviously messy in Codd’s Relational Calculus?". Answer:


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-03/15/21: Cleaned up the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


 “Good question. A short answer is that first-order predicate logic, originating in the 19th century and achieving maturity in the 1930s, IS a relational calculus (though not a relational algebra). A longer answer (for a future post maybe?) is an argument why the product of many clever logicians, labouring over decades, should inspire more confidence than a homebrew alternative ... mainstream predicate logic ... is not adequate for writing assertions. A promising approach ... propose[s] to use predicate logic [with only minor] modifications ... that include arrays and multiple sorts for the variables ... as programming language ...  when one wants to define program semantics by means of predicate logic one should be very careful. One way of doing that is to stick with the first-order predicate logic, mainstream variety, as presented by any of the undergraduate textbooks mentioned above. It is not only safer and but saves work compared to creating and publishing a homebrew system.”

The argument seems to be as follows: because first order predicate logic (mainstream FOPL) -- which is a relational calculus (RC) but not a relational algebra (RA) -- is inadequate for expressing assertions (statements of fact), by modifying FOPL for equivalence with his RA Codd created a messy "homebrew logic". He should have stuck to pure FOPL as a foundation for programming languages, which would have required only minor changes to it, thus avoiding the mess.

Since this is one of the aspects on which David McGoveran has focused in his ongoing effort to re-interpret, extend and formalize Codd's RDM (draft chapters), the following debunking is based on my own understanding and interpretation thereof. 

There is confusion here about relationships between logic vs formal languages, syntax vs semantics, and formal logical systems vs computer languages -- these pairs are related, but different. Some is due to Codd's logical "glosses" without working out or be consistent about the details as to how RA should be interpreted and used. Coupled with the common poor grasp of fundamentals in the industry, this inhibits understanding the differences between data sublanguages and programming languages.

Here's the abstract of Codd's referenced article:

In the near future, we can expect a great variety of [data sub]languages to be proposed for interrogating and updating data bases. This paper attempts to provide a theoretical basis which may be used to determine how complete a selection capability is provided in a proposed data sublanguage independently of any host language in which the sublanguage may be embedded. A relational algebra and a relational calculus are defined. Then, an algorithm is presented for reducing an arbitrary relation-defining expression (based on the calculus) into a semantically equivalent expression of the relational algebra. Finally, some opinions are stated regarding the relative merits of calculus- oriented versus algebra-oriented data sublanguages from the standpoint of optimal search and highly discriminating authorization schemes.

Note: A logical calculus computes a final truth value from component truth values like arithmetic computes a numeric answer from numbers. A logical algebra manipulates expressions, which can then be evaluated -- think ordinary algebra which substitutes values for variables in the final expression. 

By referring to Codd's relational calculus (RC) as a "homebrew logic", the author indicates that he thinks of it as an alternative form of logic distinct from FOPL. But developing a system based on some logic -- here a data sublanguage (which FOPL is not) -- is not the same as developing an alternative to that logic. “...providing a practical logical query language. It may be argued that a "homebrew" logic might prove more successful for the tasks at hand (e.g., easier to work with) than a traditional alternative.” observed a reader, but in this case there is no alternative: Codd realized that a subset of it  (excluding, for example, infinite domains, which it cannot handle and which are not necessary for finite databases) could express simple set theory (SST) and, therefore, relations and simply used that part. In fact, in FOPL the objects in the universe of discourse are propositions and predicates and FOPL can be considered a RC only because of that insight -- we know of no pre-Codd references to FOPL as such.

Codd's second stated objective was “establishing the relational completeness of RA”:

“A query language (or other data sublanguage) which is claimed to be general purpose should be at least relationally complete in the sense defined in this paper. Both the algebra and calculus described herein provide a foundation for designing relationally complete query languages without resorting to programming loops or any other form of branched execution -- an important consideration when interrogating a data base from a terminal.”

which addresses some of the author's concern that “FOPL is a relational calculus (though not a relational algebra).” 
“Van Emden seems to think that Codd meant 'deductive completeness' in the absolute sense, but FOPL, let alone a fragment thereof, cannot be deductively complete. The fragment Codd used is as deductively complete as propositional logic, allowed only because databases are finite. Codd meant 'expressive completeness' and with respect to the RA in particular: he proved RC is as expressively complete as the RA, thereby sidestepping the issue of deductive incompleteness of FOPL -- he showed that you can use either RC or RA with equal expressiveness.”      --David McGoveran

Furthermore, having started with second order logic, Codd had switched to FOPL when he realized that to take full advantage of the RDM (and avoid problems) it is necessary to restrict the power of data sublanguages to FOPL and relegate higher power to computationally complete host programming languages -- the opposite of the author's preferred "defining program semantics by means of predicate logic.

“Formal (logic) and programming languages, though related, have very different objectives and their development is subject to different constraints. Turing (i.e., computational) completeness needs much more power than FOPL as a logical foundation -- even the standard "simple" assignment programming construct already requires the lambda calculus, because it is not possible in FOPL. Moreover, semantics is and always has been a controversial subject area, and almost every renowned logician has argued for his/her own theory of semantics and their own theory about "logical truth" (this is the Quine issue). There does not seem to be here differentiation of logical truth, logical validity, and truth as in representing the physical world of experience and evidence.”         --David McGoveran
Not to mention the implications that "arrays and multiple sorts for the variables" would have for language simplicity/ease of use. Codd had the following to say about the relative merits of RC and RA as a basis for relational data sublanguages:
“One advantage that might be claimed for the algebraic approach is its freedom from quantifiers. However, the calculus appears to be superior to the algebra in four respects:
1) Ease of Augmentation
2) Scope for Search Optimization
3) Authorization Capability
4) Closeness to Natural Language.”
His own sublanguage ALPHA is based on the RC. The only commercial RC-based data sublanguage was Ingres' QUEL, considered superior to SQL, which is based on both RC and RA, violates relational principles and is poorly designed. The industry ignored ALPHA and adopted SQL, which had little to do with merit, but with marketing and market dominance": at that time nobody could afford incompatibility with the supporter of SQL, IBM. With Oracle the first to market, the rest is history, belying the myth "the market determines the best product".

No comments:

Post a Comment

View My Stats