Friday, September 27, 2019

Data Sublanguage Part 2: Data Manipulation and Definition


Revised 10/10/2019.

In Part 1 we showed that Codd intended in 1969 to base the RDM on axiomatic set theory (AST) and second order logic (SOL) to accommodate relation-valued domains (RVD) (i.e., sets of sets), but that for the benefit of relational advantages and to avoid SOL problems he had to trade off the expressive power of AST/SOL for the simple set theory (SST) of proper sets (i.e., relations in normal form) expressible in first order predicate logic (FOPL) and, thus, computational for relational completeness[1]. He retained the power of the former for applications by hosting a relationally complete FOPL-based language expressing the RA in computationally complete programming languages (CCL).

We also alerted to an important, but unnoticed detail: data sublanguage appeared in the 1970 paper -- in 1969 Codd referred to retrieval sublanguage. This can be understood only with reference to the theoretical foundation of the RDM.


Wednesday, September 25, 2019

Test Your Foundation Knowledge

The Web is chockful of unnoticed/unquestioned pronouncements by self-taught novices or "experts" that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory that is not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

I have always contended that practitioners who cannot detect such misconceptions, and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, neither can they associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments -- what category, (1) or (2) do they fall in? 

Sunday, September 22, 2019

Data Sublanguage Part 1: Relational vs. Computational Completeness


Note: I have revised the "Logical Access, Data Sublanguage, Kinds of Relations, Database Redundancy, and Consistency" paper in the "Understanding the Real RDM" series" (available from the PAPERS page) for consistency with this post.

“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”
--StackOverflow.com
“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”
--Dataversity.net

The concept of a sublanguge cannot be understood without foundation knowledge and familiarity with the history of the database management field, both lacking in the industry.

Tuesday, September 17, 2019

Test Your Foundation Knowledge

The Web is chockful of unnoticed/unquestioned pronouncements by novices or "experts", many self-taught, that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory, not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

Practitioners who cannot detect such misconceptions and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, they cannot associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments, including what category, (1) or (2) do they fall in? If you can't, proper education is in order.

“Why is it so hard to standardize a Graph Query Language? It is because graph databases are strongly dependent on the data model and the physical layer implementation. And most important currently there is a lack of a uniform representation for these two factors that vary a lot.”
“...good points, yes in principle the query language should be independent of both the data model and the database storage engine. But with Graph Databases that is not happening. See SPARQL vs (Cypher, GraphQL, Gremlin, GSQL) competitive query languages. Then when you  examine the other side, i.e. the property graph databases, more carefully you discover that each vendor has built many tweaks and features that are based on their physical layer implementation.”
“...why I am skeptical, it is because I think the real bet in database arena will be to bridge Row and Column databases, i.e. SQL databases with graph databases (triple stores, property graph stores). Can we have a better approach that covers them all independent of the data model and the physical layer implementation ? Again we have seen such efforts with SPARQL-Relational mappings but...” 
“One of the key aspects that makes graph so powerful is that you have the ability to referentially annotate, either at the nodal level or at the assertion level (by creating a structure such as ?assertion :hasStructure {:subject ?s; :predicate ?p; :object ?o} (property graphs subsume this in the predicate, while RDF graphs don't). That annotation can contain advisory schematic information, constraint modeling and so forth. This is usually missing from Codd-oriented data stores, one reason why its a relatively easy trip from relational to semantic, but a considerably more complex one in the other direction. The other aspect (and something that you can argue both sides about) is that normalization is a key requirement for any many-to-many relationship in Codd algebra, but it is not necessarily a requirement in a graph. My biggest problem with OWL is not in its existence but in the implicit requirement of internal consistency and the overall complexity of the language.  In a purely mathematical environment, this makes sense, but in a data-world sense, inconsistency is pretty much a given.”
“When I talked about an ontology being semantically neutral, my argument was that you need some kind of operational ontology to present hooks on which to lay the topical or thematic ontology, something analogous to REST publishing modes. That ontology is comparatively primitive, but it is what provides the substrate to deduce the relevant relationships (or, put another way, to build a discovery mechanism upon).”
“You are right to ask about specifications about structural components of the "graph data model". In my opinion this is a key differentiation factor. But which graph data model we are talking about? In associative, semiotic, hypergraph data model (a.k.a R3DM/S3DM), there are Entities and Attributes that cover the metadata (dictionary) TBox database component, and then you have Associative Entity Sets (ASETs) and HyperAtom collections (HACOLs) that cover the ABox component. There are well defined transparent operations, especially SET operations on ASETs and HACOLs and there is a clean, distinct separation between ABox and TBox components ... There is a huge difference between theory and practice. I am afraid many theoreticians of the past, may rest in peace, and their followers in Relational/Graph domain have failed to understand the difference between software engineering, i.e. make something that works, and pure computer science, i.e. imagine something that works.”
“A good graph query language should, in general, be independent of the data model. What is needed, though, is a mechanism for enabling the discovery of specific types of predicate relationships. SPARQL is a good start, in terms of data model independence if you have a known core ontology (OWL, SHACL, SKOS, what have you) but if you don't have any means of discovering what the foundational language is, then it breaks down. SPARQL also doesn't handle anonymous paths well. GraphQL tries to turn a JSON database into a graph database, but it also faces the limitations of predicate discovery. Most other graph query languages work upon the assumption that you have property oriented graph implementation, but these tend not to scale well.”
“A lot of my work of late focuses on building knowledge bases. Typically you can define fairly complex classes (or more properly classes with a number of properties within the knowledge base itself that becomes the conceptual model for the creator of the knowledge base, but beneath that there is a second operational model (typically OWL, RDFS or SHACL-like) that is used primarily by the query engine. That operational model is simpler, more akin to a REST interface than anything, but it makes it possible to serve and update the knowledge base model. This is what I'm referring to when I talk about being independent of the data model - you're working with the operational model (which is primarily a  publishing model) in order to facilitate a more complex model.”
      --Why is it so hard to standardize a Graph Query Language?
                                         


References

Graph Databases: They Who Forget the Past...

Sets vs. Graphs

What Is a Data Model, and What It Is Not

What Is a Data Model

Data Model Neither Business, Nor Logical, Nor Physical Model