Thursday, September 27, 2012

A Note on Education vs. Training



Gene Wirchenko drew my attention to an Infoworld article by Andrew Oliver, Ill-informed haters go after MongoDB, which is kind of a response to the articles critical of MongoDB which I commented on in previous posts. The gist of the article is described as: "NoSQL databases like MongoDB are great for some tasks but not for others. Is it MongoDB's fault if misguided developers use it to solve the wrong problem?"
With any new technology comes a wave of marketing happy talk, which in turn leads to inexperienced developers "jumping on the train" of a new fad. Inevitably, these newbies find themselves disappointed that the technology doesn't deliver on their inflated expectations.
Oliver correctly identifies the core systemic problem in database management, one that I have been warning of for almost my entire 25+ career in the field: the lethal combination of proliferation of thoroughly hyped ad-hoc products and technologies by vendors unfamiliar with the foundation and history of the field to database professionals and users equally unfamiliar with same. Neither do I find fault with the advice he offers at the end of his article:
"Take blogs with a grain of salt ... make sure you understand the technology before using it on a critical project. If you don't heed this advice, some writer for Infoworld on a short deadline in a slow news week might decide to ridicule you!
although I don't think the ridicule by journalists, even less knowledgeable about what they cover, is the most serious consequence.

But he fails to make the connection between a major source of the problem and the effectiveness of his advice.

The IT industry in general and the database field in particular rely almost exclusively on tools experience. Practitioners are inducted in the field mainly via practice with specific tools that happen to be in vogue at specific times; job descriptions don't require much beyond that; and academia has been turned away from science and education into a research and certification vehicle for vendors and their tools, a trend which Dijkstra has attacked decades ago much better than I can. I experienced this personally on more than one occasion. To recall two:
  • When I offered a presentation on data fundamentals to a reputable computer science department, there was no interest, as they were too busy with "XML research".
  • When I tried to teach an introductory course in database management at a local university by developing a syllabus on data fundamentals, I was quickly disabused of that illusion by a demand to use a specific book and teach Oracle.
A lot is being made on the "high education bubble", the exploding cost of an academic education and the burdening indebtness caused by it. Among its many implications there is an insidious one. Enormous pressure is exerted, for obvious reasons, on academia to turn from educational to vocational: from employers and vendors (via various incentives that are hard to resist) and from students, particularly those sponsored by employers or vendors.

I do not believe that knowledge of and experience with tools alone is sufficient to address the problems underlying the database field. Without foundation knowledge, including the history of the field, relabeled old discarded products will continue to proliferate and practitioners will lack the capacity to avoid being seduced by hype.

Indeed, one could argue that the attraction of the so-called "schema-less" NoSQL products is due to the difficulty to think conceptually and logically about requirements and evaluate technologies and products critically because the necessary knowledge and ability to reason and abstract--distinct from tool experience--have not been inculcated. The commonly used assertion "different databases for different purposes" or "the right tool for the right task" are trivial and trite and can be misleading without the benefit of foundation knowledge external to the tools themselves.

Note very carefully that I do not mean to imply that tool experience is unimportant, which would be nonsense. Rather, I claim that it is necessary but insufficient for intelligent functioning in the database field, as it probably is in many other fields.

Wednesday, September 26, 2012

Mountain View Presentation



I will be giving a presentation to the Silicon Valley SQL Server User Group:

Foundation Knowledge for Database Professionals

October 16, 6:30pm

1065 La Avenida
Building 1
Mountain View, CA (map)

It is open to all. Please join us and invite anybody who might be interested.

More details here

Quote of the Week



I've read a few things about NoSQL (technology or movement? That is the question!). As my colleague Stuart McLachlan rephrased the question: If it's a movement, the real question is whether it's a religious or bowel movement.
--SQL/PostSQL/NoSQL, artfulopinions.blogspot.com

To Laugh or Cry?



To Laugh or Cry? items are specifically selected as lost causes (usually, the problems lie with either the author of the item, or with those who respond to it, or both). For that reason I normally do not comment on them. This week's piece is somewhat unusual in that the target is not the author, but rather a vendor. The following item was brought to my attention by Matt Rogish.

Diego Basch, I’ll Give MongoDB Another Try. In Ten Years.

Saturday, September 22, 2012

Davide and David on NoSQL



After publishing my 3-parter on NoSQL I came across a post by Davide Mauri and an interview with David McGoveran on the subject, to which I would like to add some clarifications (this is not a debunking, as they hardly justify such).

Mauri asks: "Do NoSQL people really want to drop the relational model?"

Thursday, September 20, 2012

PRACTICAL DATABASE FOUNDATIONS PAPER #1: BUSINESS MODELING FOR DATABASE DESIGN




Paper #1, revised and expanded, is available for purchase. Please follow the instructions on the PAPERS page to order.

About this Paper

The purpose of this paper is to introduce practitioners to data fundamentals and advance their understanding via business modeling for database design.
It focuses particularly on and distinguishes between the three types of model that are often confused: business model, logical model and data model.

The paper will delve into each one in detail, explain the basic modeling and design concepts, principles and methodology of formalizing an informal business model via the relational data model into a formal logical database model which can be computerized.

Preface

Introduction

Business Modeling
Basic Modeling Concepts
Business Rules
Property Rules
Class Rules

Attribute Rules

Distinctness Rules

Functional Dependency Rules

Referential Rules

Associative Entities

Optional Rules
Business Models
Database Design
Formalizing the Informal
Predicates and Propositions

The Relational Data Model
Relational Structure

Relational Manipulation

Relational Integrity
External Predicates

Integrity Constraints and Internal Predicates
Logical Models
Understanding Database Management
A Foundation Framework
Applying the Framework (Exercises)
Appendix A: Integrity Constraints Formulation and Verification

References

Wednesday, September 19, 2012

Data Models and Usefulness



In my 3-part article on dynamic schema, NoSQL and the relational model I admitted that I did not know much about NoSQL products in general and MongoDB in particular. Nevertheless, it was not difficult to figure out what problems would be faced using a docubase for database management purposes.

Then Matt Rogish alerted me to Why I Migrated Away From MongoDB. I suggest you read it in the context of my article and see how close my suspicions were to reality. Here are some quotes worth pondering (emphasis mine):
Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>. 
People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud. 
...when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case [FP: heh, heh]... And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.
...somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost.
Then it might have been the lack of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.
Anybody with foundation knowledge would expect these problems. As I argued so many times: a data structure determines manipulation and, therefore, usefulness for a given informational purpose.

Oh, and in this context, see my AllAnalytics posts and the exchanges with readers: 

Knowing What a Database Is
Unstructured Data

Tuesday, September 18, 2012

Object Orientation, Logic and Database Management



WS writes:
On the subject of object orientation I often feel inclined to quote Leslie Lamport's comment in the introduction to his book "Specifying Systems": "If exposure to C++ hasn't completely destroyed your ability to think logically, you should have no trouble filling in the gaps in your mathematics education".
I tried for many years to understand OO, but without success. It didn't occur to me until I encountered the material on dbdebunk that the problem might not lie with me. OO's lack of a formal definition leads, I think inevitably, to what I have come to call PALC (Pointless Additional Layers of Complexity). Every addition to OO leads to new problems that require ever more complicated solutions, that in turn create a whole new set of problems.

Monday, September 17, 2012

To Laugh or Cry?




Is it really necessary to normalize your databases in third normal form?, social.technet.microsoft.com

Quote of the Week



The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.

2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:

    Relational:
        Was designed to do relational stuff* from the get-go, even if it now does other things too.
        Supports a lot of SQL.
    Non-relational:
        Was designed primarily to do non-relational things.*
        Doesn’t support all that much SQL.

*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
--Curt Monash

Weekly DBDebunk News



1. I have added
  • a "Link to this blog" button and code at the top right
  • URL, HTML and BB/Forum link codes at the end of each post
If you like the blog or posts, please use them to link to them. This being a new blog, it will help with visibility. Thanks.

2. Shortly I will switch the blog to the old domain, dbdebunk.com. You don't need to do anything as the blogspot will simply forward you to that domain.

3. For those interested in databases and gaming, there is a forum thread at

https://forums.eveonline.com/default.aspx?g=posts&m=1919633

about problems with EDK, an older 'massive multiplayer online' (MMO) game based on mySQL, the latter supposedly being broken by the application/user load. The discussion is about the new DBMS to be considered and the choice between SQL and NoSQL (MongoDB about which I wrote in my three-part article, is mentioned).

I do not presume to know the technicalities and terminology of gaming systems and I am not in a position to judge whether current SQL systems can adequately satisfy gaming in general and EDK in particular. Perhaps readers who have knowledge in this area can enlighten us.

But if you read the exchange carefully you can detect the familiar sources of the (misleading) attraction of NoSQL: lack of familiarity with history and foundation knowledge, developers not transcending their application view of databases, the logical-physical confusion, the failure by SQL and its implementations to adhere to the relational model and so on. There is one developer with foundation knowledge, though, who is readily discernible.

We are regressing decades. I may not be around when somebody will have to "discover" the relational model in order to address the very same problems that Codd  thought he did 40+ years ago.

4. Matt Rogish has submitted this link: 

Why I Migrated Away From MongoDB
which I may comment on later.

Friday, September 7, 2012

Forward to the Past: "Out-clevering" the DBMS



When I see a title like "Software Engineering: What are some clever programming techniques that aren't used often enough?" my antennae come up. More often than not (1) programmers consider certain things clever only because they are not familiar with history (2) in its quest for solving problems engineering sometimes tries to out-clever science, which usually spells trouble.

Wednesday, September 5, 2012

Quote of the Week



Relational databases ... had poisoned whole generation of programmers thinking (incl. me). With no clue we approached everything through relational prism that we studied in university. MySQL was the only OSS alternative that cut enough corners to make relational DBMS fit the problems we tried to solve. For which the relational paradigm wasn't a good fit anyways.

So. then comes along the NoSQL stuff and you find out that you have much better tools than relational for these kinds of problems (simple websites, blogging sites, feed sites, social sites, etc.). Along the way you might see all the other problems that really need realational (ERP systems etc.). But now you need real relational not some half-baked all corners cut MySQL stuff. Yeah that also goes for InnoDB which might have become something if Oracle hadn't freezed it.

Then you try out Postgres (because it's free) and when it works for big workloads (yes it does in skilled hands) then you are hooked. If by chance you then check out Oracle ... it feels like a twelve ton dinosaur ("erm. no boolean type? whaat - implicit commit with DDL?"). If by any miracle you don't have the 1% of projects that need something special - Oracle RAC or Spatial or whatnot. Then Postgre is the most mainstream option available - there just is no other as popular, as functional and as cheap solution for these kinds of problem domains that really require relational DBMS.

PS: been using MySQL (both MyISAM/InnoDB) over 10 years, PostgreSQL over 5 years and Oracle over 5 years. --Hipsters Hacking on PosgreSQL, theRegister.com

To Laugh or Cry?




We don't Need No Stinkin' Database, The Daily WTF
View My Stats