Wednesday, September 19, 2012

Data Models and Usefulness

In my 3-part article on dynamic schema, NoSQL and the relational model I admitted that I did not know much about NoSQL products in general and MongoDB in particular. Nevertheless, it was not difficult to figure out what problems would be faced using a docubase for database management purposes.

Then Matt Rogish alerted me to Why I Migrated Away From MongoDB. I suggest you read it in the context of my article and see how close my suspicions were to reality. Here are some quotes worth pondering (emphasis mine):
Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>. 
People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud. 
...when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case [FP: heh, heh]... And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.
...somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost.
Then it might have been the lack of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.
Anybody with foundation knowledge would expect these problems. As I argued so many times: a data structure determines manipulation and, therefore, usefulness for a given informational purpose.

Oh, and in this context, see my AllAnalytics posts and the exchanges with readers: 

Knowing What a Database Is
Unstructured Data

No comments:

Post a Comment