Tuesday, September 17, 2019

Testing Your Foundation Knowledge



The Web is chockful of unnoticed/unquestioned pronouncements by novices or "experts", many self-taught, that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory, not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

Practitioners who cannot detect such misconceptions and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, they cannot associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments, including what category, (1) or (2) do they fall in? If you can't, proper education is in order.



“Why is it so hard to standardize a Graph Query Language? It is because graph databases are strongly dependent on the data model and the physical layer implementation. And most important currently there is a lack of a uniform representation for these two factors that vary a lot.”
“...good points, yes in principle the query language should be independent of both the data model and the database storage engine. But with Graph Databases that is not happening. See SPARQL vs (Cypher, GraphQL, Gremlin, GSQL) competitive query languages. Then when you  examine the other side, i.e. the property graph databases, more carefully you discover that each vendor has built many tweaks and features that are based on their physical layer implementation.”
“...why I am skeptical, it is because I think the real bet in database arena will be to bridge Row and Column databases, i.e. SQL databases with graph databases (triple stores, property graph stores). Can we have a better approach that covers them all independent of the data model and the physical layer implementation ? Again we have seen such efforts with SPARQL-Relational mappings but...” 
“One of the key aspects that makes graph so powerful is that you have the ability to referentially annotate, either at the nodal level or at the assertion level (by creating a structure such as ?assertion :hasStructure {:subject ?s; :predicate ?p; :object ?o} (property graphs subsume this in the predicate, while RDF graphs don't). That annotation can contain advisory schematic information, constraint modeling and so forth. This is usually missing from Codd-oriented data stores, one reason why its a relatively easy trip from relational to semantic, but a considerably more complex one in the other direction. The other aspect (and something that you can argue both sides about) is that normalization is a key requirement for any many-to-many relationship in Codd algebra, but it is not necessarily a requirement in a graph. My biggest problem with OWL is not in its existence but in the implicit requirement of internal consistency and the overall complexity of the language.  In a purely mathematical environment, this makes sense, but in a data-world sense, inconsistency is pretty much a given.”
“When I talked about an ontology being semantically neutral, my argument was that you need some kind of operational ontology to present hooks on which to lay the topical or thematic ontology, something analogous to REST publishing modes. That ontology is comparatively primitive, but it is what provides the substrate to deduce the relevant relationships (or, put another way, to build a discovery mechanism upon).”
“You are right to ask about specifications about structural components of the "graph data model". In my opinion this is a key differentiation factor. But which graph data model we are talking about? In associative, semiotic, hypergraph data model (a.k.a R3DM/S3DM), there are Entities and Attributes that cover the metadata (dictionary) TBox database component, and then you have Associative Entity Sets (ASETs) and HyperAtom collections (HACOLs) that cover the ABox component. There are well defined transparent operations, especially SET operations on ASETs and HACOLs and there is a clean, distinct separation between ABox and TBox components ... There is a huge difference between theory and practice. I am afraid many theoreticians of the past, may rest in peace, and their followers in Relational/Graph domain have failed to understand the difference between software engineering, i.e. make something that works, and pure computer science, i.e. imagine something that works.”
“A good graph query language should, in general, be independent of the data model. What is needed, though, is a mechanism for enabling the discovery of specific types of predicate relationships. SPARQL is a good start, in terms of data model independence if you have a known core ontology (OWL, SHACL, SKOS, what have you) but if you don't have any means of discovering what the foundational language is, then it breaks down. SPARQL also doesn't handle anonymous paths well. GraphQL tries to turn a JSON database into a graph database, but it also faces the limitations of predicate discovery. Most other graph query languages work upon the assumption that you have property oriented graph implementation, but these tend not to scale well.”
“A lot of my work of late focuses on building knowledge bases. Typically you can define fairly complex classes (or more properly classes with a number of properties within the knowledge base itself that becomes the conceptual model for the creator of the knowledge base, but beneath that there is a second operational model (typically OWL, RDFS or SHACL-like) that is used primarily by the query engine. That operational model is simpler, more akin to a REST interface than anything, but it makes it possible to serve and update the knowledge base model. This is what I'm referring to when I talk about being independent of the data model - you're working with the operational model (which is primarily a  publishing model) in order to facilitate a more complex model.”
      --Why is it so hard to standardize a Graph Query Language?
                                         


References

Graph Databases: They Who Forget the Past...

Sets vs. Graphs

What Is a Data Model, and What It Is Not

What Is a Data Model

Data Model Neither Business, Nor Logical, Nor Physical Model




No comments:

Post a Comment

View My Stats