Sunday, July 5, 2015

The SQL and NoSQL Effects: Will They Ever Learn? UPDATED



UPDATE: I refer readers to Apache Cassandra … What Happened Next. Note that this was an optimal use case for NoSQL. Read it focused on the simplicity of the data model and particularly physical data independence relative to RDM. 

In Oracle and the NoSQL Effect, Robin Schumacher (RS), a former "data god" DBA and MySQL executive now working for a NoSQL vendor claims that Oracle’s recent fiscal Q4 miss--a fraction of what's to come--is due to its failure to recognize that
"... web apps ushered in a new model for development and distributed systems that ... [r]elational databases are fundamentally ill suited to handle ... Their master-slave architectures, methods for writing and reading data, and data distribution mechanisms simply cannot meet the key requirements of modern web, mobile and IoT applications. I tell you that not as an employee of a NoSQL company, but as a guy who has worked with RDBMS’s for over twenty-five years. In short, you simply can’t get there from here where relational technology is concerned, and that’s why NoSQL must be used for the applications we’re talking about.


One should not draw technical conclusions from the god complex of DBA's. The function of the database has always been to serve applications and if they and the DBMS vendors forget it, they have only themselves, not the data model, to blame for the consequences. However, this does not mean we should regress to the pre-database days and bias databases for some application and against others, defeating the purpose. Unfortunately, that's what NoSQL is about:
My feeling is that the field of NoSQL was created EXACTLY so the data should not be normalized like in relational databases -- which has the disadvantages that data needed for real time/online applications needed to be joined at runtime before being used by the application. Under the time constraints of an online system, this is unacceptable. Hence, application developers want to store persistently the data EXACTLY in the way application see it: pre-aggregated, potentially inconsitent, and potentially replicated.
--Daniela Florescu, LinkedIn.com
Are you sniffing logical-physical confusion (LPC)? If not, here's those tough requirements that, according to RS, relational systems "cannot satisfy":
  • constant uptime with no chance of any outage;
  • distribute enormous amounts of data;
  • write (not just read) insanely fast amounts of data;
  • double or quadruple my capacity at a moment’s notice;
  • constant database performance no matter the user or data load;
  • handle all kinds of data efficiently without storage overload;
  • do all the above and more without breaking my budget.
Now, what exactly does a data model have to do with these? The RDM imposes no limitations whatsoever on physical implementation, giving DBMS vendors and DBA's complete freedom to do anything they deem necessary at the physical level to satisfy these requirements:
The NoSQL camp put performance, scalability, and reliability front and center but lost the opportunity to take the relational model to the next level because—just like the relational [Ed. note: No, the SQL!] camp—it mistakenly believed that normalization dictates physical storage choices, that non-relational APIs are forbidden by the relational model, and that “relational” is synonymous with ACID (Atomicity, Consistency, Isolation, and Durability).

The NoSQL camp created a number of innovations: functional segmentation, sharding, replication, eventual consistency, and schemaless design. Since these innovations are compatible with the relational model, they should eventually be absorbed by mainstream database management systems.
--The Rise and Fall of the NoSQL Empire, 2007–2013 (from LINKS page)
Innovations? The RDM was intended to facilitate them--but SQL vendors failed to implement them. Except for "schemaless design" of course: NoSQL proponents would stop mentioning it if they understood what a schema is.

RS boasts that as MySQL product manager he defied "database purists who scolded him and accused him of breaking the rules of standard RDBMS practice ... to accommodate what web apps needed from their database. The end result was that MySQL became a huge success because of what it delivered to web v1." So it is possible to get there from here after all! Be that as it may, that is not the key point.

The history of the database field is part of the foundation knowledge that data professionals lack and is the reason for industry systematic failure to learn from its mistakes. When IBM initiated its relational project, Codd was an IBM research fellow and warned that SQL--a research prototype--was flawed both relationally and as a data language because its designers and implementers did not understand and appreciate the RDM. Always the salesman, Ellison rushed to be the first with a SQL system in the market, with IBM following suit and--as
the Google of that time--serving as a de-facto enforcer of adoption. A shrewd business decision given the products available at the time, but a technologically limiting one in the long term. What ensued was "the SQL effect": an entrenched confusion of SQL DBMS's for RDBMS's. In time, the limitations imposed by the former have been been blamed on the latter. What is insidious about the SQL effect is that it prevents the real solution to the problem: if you think SQL DBMS's are RDBMS's and deem them deficient, you'll dump the RDM and, presto, the NoSQL effect. Oracle and IBM have a lot to answer for.

Both effects are both rooted in the same disregard for data fundamentals and a sound theoretical foundation. It is not possible to resolve the problems due to the poor relational fidelity of SQL non-relationally with NoSQL. Had Oracle and IBM known, understood and appreciated the RDM, there would not have been a SQL effect and, therefore, no NoSQL effect. As it is, users keep migrating non-productively from effect to effect, without addressing the core issues.


As Date says, no SQL, but no NoSQL either!



No comments:

Post a Comment

View My Stats