Sunday, April 20, 2014

Forward to the Past: From Codd to SQL to NoSQL

As told by C. J. Date, sometime shortly after the introduction of SQL DBMS's in the industry, when non-relational products e.g. hierarchic and network reigned and the relational idea was a very hard sell, he and Michael Stonebraker (the author of Ingres and at the time a professor of Computer Science at University of California Berkeley) participated in a panel at a technical conference. The following is the (praphrased) exchange between them:
CJD: The reality is that most practitioners are too set in their non-relational ways and we cannot expect them to understand and appreciate the relational model. Rather, we must focus on the young generation of practitioners, who learn the relational model at university.

MS: Chris, you don't understand. I am teaching those youths: they were not around when we struggled with the huge problems of the pre-relational systems and they are reinventing all of them!

I have recently came across a review of the book NOSQL DISTILLED in the NoCOUG Journal and both the book and the review demonstrate how pre-scient Stonebraker was.

Anybody interested in how the absence of a sound theoretical foundation inhibits technological progress should read the review and appreciate the genius in Codd's invention intended to avoid many of those very consequence to reoccur (see my paper Truly Relational: What It Really Means). Unfortunately, you can bring a horse to water, but you cannot make it drink.

Here's some extracts from the review.
... “polyglot persistence,” ... a world in which relational is not the only way to store and manage data.

...“NoSQL” is ill defined but usually refers to a number of nonrelational databases ... [and] schemaless data and systems where gains in performance are traded against other things like consistency.

... the reader should care about NoSQL databases ... [for] two main reasons: first, application development productivity and second, large scale data. Along the way the fact that large data sets are usually run on clusters of servers is also brought up

... a review of where we are and how RDBMSs came to run the world ... Funny, I remember when object databases were going to take over, but it never happened. Strange how some “new things” come and go and others come and conquer. [FP: Not strange at all, the opposite would be!]

Next we learn about the “impedance mismatch,” which is defined as the difference between the relational model and various in memory data structures ... The best way to better understand your existing RDBMS is to learn about NoSQL systems. The mismatch is between how data is stored in the relational database as opposed to how it is used by the applications that build in memory data structures.

“A data model is the model through which we perceive and manipulate our data.” This leads to relational tables being the default data model. Each of the NoSQL solutions has a different data model ... I immediately wonder how we will support all these different data models in one organization. [FP: Good question!]

First we have a discussion of aggregates and an example comparing data stored in a relational system and a NoSQL system that uses the aggregate data model. The point is that data is stored in groups (the aggregate) instead of in normalized tables. Everything about one customer could be stored in one aggregate instead of spread out among many relational tables.
There's plenty more of this stuff, but I cannot stomach it. If you do not understand why they are problematic and misleading, the acquisition of some foundation knowledge is recommended.

No comments:

Post a Comment

View My Stats