Monday, August 24, 2020

Oldies But Goodies: Data Sublanguage vs. SQL

Note: To demonstrate the stability of a sound foundation relative to the industry's fad-driven cookbook practices, I am re-publishing under "Oldies But Goodies" material from the old (2000-06), so that you can judge for yourself how well my debunkings hold up and whether the industry has progressed beyond the misconceptions they were intended to dispel. I may break long pieces into more pointed parts, and add comments and references to further reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.
“Crucial to making the dream of relational databases a reality was the development of a new query language which took significant advantage of relational theory. Then contemporary query languages like CODASYL usually required incredibly complicated code to answer even the simplest of questions from a database. Codd himself developed a relational language called ALPHA that he often used to communicate his ideas in an academic context [Actually, this is not true], but it attracted almost no commercial interest. In 1974 and 1975 Raymond Boyce and Don Chamberlin of IBM designed a new fourth generation language to extract information from systems based on Codd's relational model known as Structured English Query Language, or SEQUEL. This would later be shortened to SQL but is still most correctly pronounced "sequel." As a relational language, SQL was designed to be descriptive rather than procedural. SQL allowed users to specify the results they wanted rather than having to specify the procedure to obtain the desired results. Theoretically this would absolve the user from having to figure out the best way to execute the program and make information available to a whole class of non-programmers. The model's scope and elegance soon drew many converts.”
It is somewhat of a stretch to deem SQL a full fledged data [sub-]language, and I am not sure how significant is the advantage it takes of relational theory. Codd, Date and others, myself included, have amply demonstrated that not only does SQL fail to support, or violates important relational features, but that it is actually a poorly designed language as well. It can be argued that at least one reason relational technology has not reached its potential is that SQL was practically its only concretization (an arguably better language, QUEL, could not compete due to lack of the then critical IBM compatibility). Not only do most practitioners think that SQL DBMSs are relational, but they actually blame the problems due to SQL's violations of, or lack of adherence to relational principles on the relational model itself!

The claim of non-procedurality (the term is declarative, not descriptive) is rather questionable. While SQL is, relatively speaking, less procedural than programming languages, or the quasi-data languages preceding, it is by no means as declarative as it should and could have been and in many cases (particularly those when it eschews relational features) the SQL formulation can be extremely procedural (see, for example, the chapters on normalization, redundancy and integrity in my latest book). One case in point are sub-queries (the "structured" in SQL's name comes from them): they were put in the language to avoid direct expressions of relational operations such as JOIN, INTERSECT, or DIFFERENCE. Unfortunately, they make SQL a highly redundant language--the same query can be expressed in many ways. Codd warned IBM about the optimization implications of such redundancy, but was ignored. In 1983 I ran multiple formulations of the same query against several SQL PC DBMSs and got response times ranging from 2 to 2500 seconds. Ironically, not all direct expressions could be avoided (e.g. UNION); what is more, explicit expression of all relational operations were later added to SQL anyway. So now there is double redundancy and the optimization complications worse!

Further Reading

Data Sublanguage series

Domains, R-tables, and SQL

Not Worth Repeating: Duplicates

To Really Understand Integrity, Don't Start with SQL

NULL Value is a Contradiction in Terms

Precision, Procedurality and SQL

DISTINCT and ORDER BY Are Not Relational

Language Redundancy and DBMS Performance: A SQL Story

No comments:

Post a Comment

View My Stats