Friday, December 20, 2013

Anatomy of a Data Management Project



I've finally found a concrete case to share that demonstrates most of the costly consequences of what happens when you engage in database practice without a good grasp of data fundamentals. A web application developer authored the article describing this case. The developer is competent enough to give an excellent post-facto description of the project that enables assessment but, as is usually the case, fails to associate problems with poor foundation knowledge. That's where I come in.



--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
--------------------------------------------------------------------------------- 

I'll provide some general comments and delve into details in subsequent posts, but readers interested in taking full advantage of what follows should read the article, "Why You Should Never Use MongoDB," and the online discussion of it. If you do read the article, don't get hooked on the specific tools, practices, and people; focus instead on general ideas, concepts, principles, and methods and their practical value. 

The author states upfront that she doesn't build database engines, but does build a lot of web applications and that those have a lot of "different requirements and different data storage needs." She added, "I've deployed most of the data stores you've heard about, and a few that you probably haven't," and admitted to having "picked the wrong one a few times." 

The article describes one such bad choice for a project on which she had helped out. The project, initiated by four New York University undergraduates, involved developing a distributed social network alternative to Facebook. A Kickstarter campaign they launched turned out so successful that they ended up leaving school and coming to San Francisco, where she worked, to "start writing code."
On reading this, my fundamentalist antennae rose up.

No. 1: You don't have to be a database management system (DBMS) designer to select one to serve applications. However, you can't make informed choices with only application development knowledge (mainly coding) -- you need some understanding of data fundamentals. Sheer experience with product choices by trial and error is not enough, and the errors can be too costly to justify.

Furthermore, note the students' assumption, implicit in their leaving school, that acquisition of such knowledge isn't necessary and that, rather, money and coding suffice. Ironically, it seems to have escaped them that the purpose of their project was to address a data management weakness in Facebook that may well have been due to its being designed in exactly the same fashion by a school dropout. The media has documented efforts by most Internet companies to overhaul their poorly thought-out initial data management systems and even reinvent the database wheel. 

No. 2: The use of terms "storage needs" and "data stores" suggests a common attitude by many application developers toward databases as sheer "data buckets." This reveals a failure to appreciate the important distinction between application and DBMS functions. ("They're both software, right?")

No. 3: This use of these terms also shows a failure to appreciate the distinctions among levels of data representation, particularly the logical level (what users and applications interact with) and the physical level (internal representation in storage). 

I believe these confusions were a major source for the project's suboptimal data management strategy that forced a DBMS switch. That switch, by the way, made analytical exploitation of the data particularly difficult.
For the details, stay tuned. And in the meantime, let me know what you think of these issues in general and the project in particular.

Related posts:







No comments:

Post a Comment

View My Stats