Monday, April 27, 2020

TYFK: "Multi-model DBMSs" is an Empty Set

Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“Traditional databases ... don't have a multi-model capability. Point is that richer data models are underused, relational data models are overused, and graph data models have so many advantages that shouldn't be ignored. Relational models, on the other hand, have wildly complex structures often with hundreds to thousands of tables. Each table then contains tens to hundreds of columns, arbitrarily constructed in each and every relational system. And just in case the situation wasn't complex enough, many of those columns are exist exclusively to manage uniqueness and provide connections to other tables. This Structure-FIrst approach produced the cascade of complexity from which we have struggled to recover ever since.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.


The Misconceptions

Chris Date once said that one cannot respond coherently to that which is incoherent. It would take several fold more space (and time) to debunk the one paragraph above thoroughly. We can focus only on some fundamental misconceptions and refer the reader to our extended writings for the rest.
  • No data models other (let alone richer) than the RDM (singular!) have been formalized and, thus, there are no "multi-model DBMSs", and there cannot be until some are.
  • Even if a graph data model (GDM) (singular!) is ever formalized:
- logical models designed using it will have serious drawbacks rather than advantages for non-network applications; and,
- a DBMS combining it with the RDM will likely defeat the latter's advantages for those applications.
  • Logical models designed using the RDM:
- do not consist of tables and columns; and,
- the number of relations and attributes in relational database is not arbitrary; and,
- is not due to data model complexity (the RDM is the simplest possible for non-network applications); and,
- the handling of a larger number is not different than a smaller one by a (1) true RDBMS(!) provided (2) users knowledgeable of data and relational fundamentals.
  • Attributes neither "manage uniqueness", nor do they "provide connection to other relations" as understood to date.
  • There is no such thing as "unstructured" data and no "structureless" data management approach.

The Debunking

Database management replaces data management by application programs and its drawbacks. Codd realized that to that end a DBMS software supports a formal data model consisting of data structure/integrity and manipulation[1]: it (1) constrains a data structure to be consistent with a conceptual model of reality and (2) manipulates it to make inferences about the world (in other words, a DBMS with a database is a logic inference engine). The data model is used to formalize conceptual models (understood semantically by users) to logical models ("understood" algorithmically by the DBMS) that represent them in the database[2].

To date only one data model has been formalized: Codd's own RDM, based on simple set theory (SST) expressible in first order predicate logic (FOPL). The dual theoretical foundation is responsible for essential advantages for applications that focus on relationships among groups of entities, distinct from those that Codd called "network applications" that focus on relationships among individual entities (within a conceptual model entities are primitive objects and entity groups are derived compound objects)[3].

Note: Conceptual models consist of entity groups, entities, properties and relationships. If the RDM is used to formalize them (i.e., for database design), logical models consist of relations, tuples, domains/attributes and constraints.

Unfortunately, "relational databases" is the industry term for SQL DBMSs that have been sold as RDBMSs and with which they are universally confused. SQL did start as a research prototype for a relational query language that was adopted as an industry standard, but because its authors lacked a good grasp of the RDM (1) it is not really structured (2) is not just for queries, and (3) is neither a true relational data sublanguage, nor a well designed DBMS language, for which reasons SQL DBMSs cannot be considered RDBMSs[4,5].

Note: A DBMS language includes a data sublanguage as well as data management functions other than data model support[6].

Second, industry references to a variety of other "data models" notwithstanding (e.g., graph, NoSQL, document, RDF), none has been formalized and, consequently, products claiming to be based on them lack well-defined, complete, theoretically sound data models in the Codd sense (if you believe otherwise, specify -- precisely please! -- the structure/integrity and manipulation components of any of these models and its theoretical foundation)[7,8]. This is also true for SQL DBMSs, given that whatever "model" they support, it's not the RDM. The only other candidate with a theoretical foundation -- graph theory --
were it ever formalized, it would apply only to network applications, and would probably rob non-network applications of critical relational advantages[9].

"Multi-model capability" refers to DBMSs (not databases, why?) that support more than one data model. But in the absence of a formalized, complete, theoretically sound data model there is no DBMS, but application-managed data. So talk about "multi-model DBMSs" in the absence of any proper data model is empty talk due, as usual, to lack of foundation knowledge and of familiarity with the history of the field. What is more, were there a true RDBMS also supporting a properly formalized GDM, this would likely defeat the advantages of the RDM for the non-network applications for which they are intended (the RDM was introduced to avoid the drawbacks of first generation "graph DBMSs" (hierarchic and CODASYL) for non-network applications[6,9].

The remaining misconceptions were debunked in earlier posts, to which we refer the reader.

  • Logical models formalized using the RDM consist of relations[10] that have attributes defined on simple domains[11], not tables with columns)[12].
  • A large number of relations and attributes in a logical model:
- is not "arbitrary", but determined by the entity groups and their defining properties in the conceptual model they represent in the database[1];
- is not due to complexity of the data model (the RDM is the simplest possible for non-network applications)[13]; and,
- is handled as easily as a smaller one by a (1) true RDBMS(!)[14] and (2) users knowledgeable of data and relational fundamentals[15].
  • Attributes neither "manage uniqueness"[16], nor do they "provide connection to other relations" as understood to date[17].
  • "Unstructured data" and "structureless data management approach" are constradictions in terms[18].

Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


[1] Pascal, F., Business Modeling for Database Design: Formalizing the Informal

[2] Pascal, F., What Is a Data Model, and What It Is Not

[3] Pascal, F., Sets vs. Graphs

[4] Darwen, H., Why Are There No Relational DBMSs

[5] Pascal, F., SQL Sins

[6] Pascal, F., Data Sublanguage series

[7] Pascal, F., Models, Models Everywhere, Nor Any Time to Think

[8] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model

[9] Pascal, F., Graph Databases: They Who Forget the Past...

[10] Pascal, F., What Relations Really Are and Why They Are Important

[11] Pascal, F., Simple Domains and Value Atomicity

[12] Pascal, F., Tables: So What?

[13] Pascal, F., Simplicity: Forgotten, Misunderstood, Underrated Relational Objective

[14] Pascal, F., What Is a True Relational System (and What It Is Not)

[15] Pascal, F., Industry Practice Is No Substitute for Foundation Knowledge

[16] Pascal, F., The Key to Relational Keys -- A New Understanding: Primary Keys

[17] Pascal, F., Association Relations vs. Foreign Keys

[18] Pascal, F., Structuring the World With "Unstructured Data"


No comments:

Post a Comment

View My Stats