Sunday, April 20, 2014

Forward to the Past: From Codd to SQL to NoSQL


As told by C. J. Date, sometime shortly after the introduction of SQL DBMS's in the industry, when non-relational products e.g. hierarchic and network reigned and the relational idea was a very hard sell, he and Michael Stonebraker (the author of Ingres and at the time a professor of Computer Science at University of California Berkeley) participated in a panel at a technical conference. The following is the (praphrased) exchange between them:
CJD: The reality is that most practitioners are too set in their non-relational ways and we cannot expect them to understand and appreciate the relational model. Rather, we must focus on the young generation of practitioners, who learn the relational model at university.

MS: Chris, you don't understand. I am teaching those youths: they were not around when we struggled with the huge problems of the pre-relational systems and they are reinventing all of them!


I have recently came across a review of the book NOSQL DISTILLED in the NoCOUG Journal and both the book and the review demonstrate how pre-scient Stonebraker was.

Anybody interested in how the absence of a sound theoretical foundation inhibits technological progress should read the review and appreciate the genius in Codd's invention intended to avoid many of those very consequence to reoccur (see my paper Truly Relational: What It Really Means). Unfortunately, you can bring a horse to water, but you cannot make it drink.

Here's some extracts from the review.
... “polyglot persistence,” ... a world in which relational is not the only way to store and manage data.

...“NoSQL” is ill defined but usually refers to a number of nonrelational databases ... [and] schemaless data and systems where gains in performance are traded against other things like consistency.

... the reader should care about NoSQL databases ... [for] two main reasons: first, application development productivity and second, large scale data. Along the way the fact that large data sets are usually run on clusters of servers is also brought up

... a review of where we are and how RDBMSs came to run the world ... Funny, I remember when object databases were going to take over, but it never happened. Strange how some “new things” come and go and others come and conquer. [FP: Not strange at all, the opposite would be!]

Next we learn about the “impedance mismatch,” which is defined as the difference between the relational model and various in memory data structures ... The best way to better understand your existing RDBMS is to learn about NoSQL systems. The mismatch is between how data is stored in the relational database as opposed to how it is used by the applications that build in memory data structures.

“A data model is the model through which we perceive and manipulate our data.” This leads to relational tables being the default data model. Each of the NoSQL solutions has a different data model ... I immediately wonder how we will support all these different data models in one organization. [FP: Good question!]

First we have a discussion of aggregates and an example comparing data stored in a relational system and a NoSQL system that uses the aggregate data model. The point is that data is stored in groups (the aggregate) instead of in normalized tables. Everything about one customer could be stored in one aggregate instead of spread out among many relational tables.
There's plenty more of this stuff, but I cannot stomach it. If you do not understand why they are problematic and misleading, the acquisition of some foundation knowledge is recommended.




Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code:

4 comments:

  1. When your boss says: "They will never buy it" you go and invent NoSQL. This is typical thing when developer focusses on only one aspect making other aspects obey the main: the users HATE the product. They will nether buy it. Data consistency is an example.

    Secutity is another example. Users hate long passwords, hate antivirus software, hate capcha etc. How many of them knows what is SSL? :) But when they loose control on their account or someone stoles money from their credit card you can say "Change you password once a half year and you will not loose your money. Again. OK?". So everybody hates security but everybody knows what happens it is neglected.

    But can someone provides evident proofs that neglecting data consistency makes someone loose money or someone's health, life or reputation can be harmed because of using data which consistency is compromised? I know this is true but what makes them know that? There is no fear of inconsistent data among users or even IT bosses.


    I think scientists who seriously deals with data problems should take more things into account when considering relational model. Not just data consistency. At least they should agree that since relational model solve the one set of problems it creates another set of problems which are beyond data consistency.

    ReplyDelete
    Replies
    1. I have no idea what you're talking about.

      Delete
    2. I just mean relational model is not easy to learn and adopt. It is less natural for human brain as text or image. If a human being is choosing between two products he choose that one he understand. The idea of my post is that it is not easy to convince people to use robust and correct tools when they have attractive option which is easy to learn and use. Even harder to show the advantages of robust technology which have a sound theoretical foundation to someone who makes business decisions.

      Delete
  2. Really?

    1st, it is much easier to learn than any other data model that the industry is using.

    2nd, what logical inferences -- data manipulation -- can a DBMS perform on text and images? What integrity constraints can it enforce?

    3rd, any technology can be taught and learned. Neither is done today.

    4th, in fact people dk how to do easy things anymore. They are used only to complex--give them easy and they are lost.

    5th, by the time they realize the limitations and problems of "easy" it's too late.

    6th, I've been educating people for decades, so you don't have to tell me it's hard. The problem is one of educational failure. Societies that stop educating their people and only train them are doomed. Look at the West -- that is exactly what is happening.

    ReplyDelete