Monday, March 23, 2020

TYFK: How (Not) to Compare NoSQL Systems and RDBMSs




Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“But if you still want to compare NOSQL databases with RDBMS, they primarily vary in
1. "normalization" where RDBMS contains normalized (upto certain degree) data and NOSQL based database contains non-normalized data;
2. RDBMS based databases are (I MUST say, generally and it isn't a criteria) fully ACID compliant while NOSQL databases are partially ACID compliant.
3. RDBMS are much slower and difficult to scale while NOSQL databases are much faster and easily scalable.
4. RDBMS normalization was very useful 50 years ago when cost of disk and memory was high, and computation power was limited. With the revolution in computing power, cheapest disk and memory availability has made RDBMS normalization a matter of joke - many people do not really understand why they need to normalize data in today's time.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Note: In what follows RDBMS refers to a truly relational DBMS (of which currently aren't any), not to be confused with a SQL DBMS.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

The Misconceptions

  • NoSQL systems are not DBMSs in the true sense of the term and should not be compared with DBMSs, let alone RDBMSs (besides, neither should databases be compared to DBMSs).
  • Relational databases (not RDBMSs!) do not consist of "normalized (up to a certain degree) data", and referring to NoSQL data as "non-normalized" is misleading at best.
  • ACID compliance is not an inherent difference between RDBMSs and non-relational DBMSs (SQL DBMSs support it even though they are not true RDBMSs).
  • The claim about relative performance and scalability is simply false, but even if it were true, such differences could not possibly be due to the "non-relational superiority" of NoSQL systems.
  • The usefulness of normalization has nothing to do with the cost of disk and memory, and computation power.
  • There is no "need to normalize".

The Correct Answer


Database management requires a formal data model with which conceptual models of reality understood semantically by users can be formalized as logical models that a DBMS "understands" algorithmically[1]. The data model used determines the (1) data integrity enforcement (i.e., consistency of the database with the conceptual model it represents) and (2) data manipulation of which the DBMS is capable[2]. Because there is no formally defined "NoSQL data model", these two core DBMS functions are relegated to applications -- the very problem from which databases/DBMSs were introduced to solve[3,4] -- which is why NoSQL systems cannot be considered DBMSs[5,6].

The only data model formally well defined to date is the RDM[7]. Database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition[8,9], otherwise they are not relations[10], the database is not relational, the critical benefits from the RDM do not materialize, and all bets are off. Given that NoSQL systems are not DBMSs, let alone RDBMSs -- no relations -- referring to their data as "non-normalized" inhibits understanding.

ACID compliance is a non-relational DBMS function (i.e., outside the RDM): every DBMS, whether relational or not, must support it to be considered such, another reason NoSQL systems are not. Note that this is a difference between non-DBMSs and DBMSs, whether the latter are relational or not (e.g., SQL DBMSs are ACID compliant even though they are not true RDBMSs)[11].

Performance and scalability are determined exclusively at the physical level and have nothing to do with relational fidelity or lack thereof. Since there are no true RDBMSs, only SQL DBMSs and NoSQL systems, all else being equal, in any instance in which one of the latter performs better or is more scalable than one of the former, the difference is due mainly to implementation of both the DBMS and the database and the fact that NoSQL systems lack many DBMS functions (e.g., integrity enforcement and transaction management).

The RDM requires (1) relations, which have simple domains[12] and (2) adherence to three database design principles, which we believe imply full normalization. Otherwise put, designs conforming with (1) and (2) implicitly produce normalized and fully normalized databases[13]. Explicit normalization (to 1NF) and any further normalization (to 5NF) is necessary only to repair non-conforming designs[14], the usefulness of which is recovery of relational advantages, foremost among them semantic consistency and DBMS-guaranteed logical validity[15].




Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] Pascal, F., Business Modeling for Database Design: Formalizing the Informal

[2] Pascal, F., What Is a Data Model, and What It Is Not

[3] Pascal, F., Application-Managed Data Not a Distributed DBMS Make

[4] Pascal, F., DBMS vs Application Enforced Constraints

[5] Pascal, F., Schema, NoSQL and the Relational Model series

[6] Normalization and Further Normalization Part 2: If You Need Them, You're Doing It Wrong

[6] Pascal, F., Forward to the Past From Codd to SQL to NoSQL

[7] Pascal, F., What Is a True Relational System (and What It Is Not)

[8] Pascal, F., First Normal Form in Theory and Practice series

[9] Pascal, F., Normalization and Further Normalization Series

[10] Pascal, F., What Relations Really Are and Why They Are Important

[11] Pascal, F., Data Sublanguage series

[12] Pascal, F., Simple Domains and Value Atomicity

[13] Pascal, F., Database Design: What It Is and Isn't

[14] Pascal, F., Normalization and Further Normalization: If You Need Them, You're Doing It Wrong

[15] Pascal, F., Logical Validity and Semantic Correctness




No comments:

Post a Comment

View My Stats