Saturday, October 26, 2019

Data Sublanguage Part 4: Conclusion

In Parts 1, 2, and 3  we showed that when the RDM is the data model:
  • A data sublanguage is short for data manipulation language (DML) that combines (1) a relationally complete retrieval component (i.e., that expresses the RA) with (2) a component that expresses updates as relation transformations;
  • A DBMS language is a careful combination, for practical purposes, of the data sublanguage with several sublanguages, each of which expresses a data management function (e.g., data definition, transactions, concurrency, authorizations) -- that are not relational, but are consistent with the RDM, and must not include syntactic elements that are at odds with, or subvert those of the DML.
Note: The RDM is the only data model consistent with Codd's definition that has been formalized [1].

We are now in a position to debunk the two quotes that triggered this series.


Saturday, October 19, 2019

Brother, Spare Me the "Paradigms"


Note: This is a revised version of an old column @All Analytics in response to a recent LinkedIn exchange (check out my comments in the exchange).
“Consider dimensional design and Big Data as two additional paradigms ... Big Data paradigms like Hadoop and NoSQL will alleviate the temptation people have to try to use the relational database in unnatural ways.”
Every few years (and the intervals are getting shorter) a "fundamentally different" new way of doing data management -- a "paradigm shift" -- is being promoted that, if you don't adopt, you’ll be "left behind". In the above mentioned exchange it is argued that data management is undergoing a paradigm shift from application-centric to data-centric data management. For the very few who (1) understand what a paradigm is and (2) are familiar with data fundamentals and the history of the field, the irony could not be richer.

Friday, October 18, 2019

Test Your Foundation Knowledge


The Web is chockful of unnoticed/unquestioned pronouncements by novices or self-taught "experts", that are (1) wrong, or (2) gobbledygook. Attempts to demonstrate the lack of foundation knowledge underlying these misconceptions are usually dismissed as "theory, not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of supporting evidence and logic. Practitioners who cannot discern such misconceptions and understand their practical implications are insufficiently prepared for a professional career in data management. They cannot associate problems with their real causes and come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads. What about you?

This is another batch in the Test Your Foundation Knowledge of online statements reflecting common misconceptions due to lack of foundation knowledge. Can you debunk them? Which of the two categories, (1) or (2), do they fall in? If not, check out the recommended references.

Comment: The kind of clueless exchange leading nowhere that takes place in the absence of foundation knowledge. See if you learned anything from it, then check out the references.


Friday, October 11, 2019

Data Sublanguage Part 3: DBMS Language


Note: 10/10/2019 significantly revised Part 2, which requires a re-read.

A formal data model consists of structure, integrity and manipulation[1,2] and so requires (1) a language that expresses data manipulation (retrievals and updates) augmented with (2) a metalanguage used to define the model's structural elements. As we showed in Part 1  and Part 2, when the data model is the RDM:

  • Data sublanguage is short for a relationally complete data manipulation sublanguage (DML) that expresses  retrievals and updates, the latter correctly understood as set-theoretic relation transformations.
  • A data definition sublanguage (DDL) is a metalanguage for DML that is outside the theory but consistent with the RDM and at least as powerful expressively as the DML (e.g., a very carefully restricted SOL to avoid self-referencing).
  • The DML and the DDL can, for practical purposes, be carefully unified into what Codd called a "comprehensive data sublanguage", but we prefer DBMS language to avoid confusion.
By carefully we mean that because only the DML is, strictly speaking, based on relational theory, the DDL must be consistent with, but differentiated from it, such that the two can't be mixed in the same expression in a way that the former subverts the latter. A DML expression can be referenced as a sub-expression by a DDL expression (e.g., as in view definitions), but only if it contains strictly retrievals (e.g., SELECT) and no updates (e.g., INSERT, UPDATE, DELETE).

Note: Data definition and manipulation are possible without a DBMS. However:

“With a relational catalog, definition can be performed via the RA, which requires physical implementation to be determined exclusively by the catalog (behind the scenes as it were) -- a kind of skeletal, primitive, or rudimentary DBMS. This is why Codd created a relational catalog that contains a description of the database and could be managed using RA-based DML. It works well unless one is allowed to mix DDL (metalanguage) with DML (language) in the same expression. Otherwise put, the database can be read to modify the catalog, but not vice-versa (as far as the DML is concerned, the catalog that describes the database does not exist).

But with a data model that, unlike the RDM, does not define a catalog such that the same language can be used for both database and it, a rudimentary DBMS must provide a workaround, and if the model is computationally complete (like CODASYL was), there must limits on how "active" the catalog is to prevent users from writing self-referencing expressions that cannot be automatically implemented because they may corrupt the database (same as would mixing data sublanguage and host language). This is one reason some of the pre-RDM directed graph DBMSs had limited notions of catalog that often required completely separate facilities to maintain.”
                                                 --David McGoveran

The DML and the DDL express two core data management functions centralized in the DBMS. When the data model is the RDM, only data management functions are permitted to access the data. At the end of Part 2 we alluded to other such functions that, like data definition, are outside relational theory, but must be consistent with the RDM.

Friday, October 4, 2019

Test Your Foundation Knowledge


The Web is chockful of unnoticed/unquestioned pronouncements by novices or "experts", many self-taught, that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory, not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

Practitioners who cannot detect such misconceptions and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, they cannot associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments, including what category, (1) or (2) do they fall in? If you can't, proper education is in order.