Sunday, May 5, 2013

Theory: As Far From Religion As One Can Get



In So What is a 'Large Database' JS states:
The points you make here, and consistently ... center pretty clearly on distinction between logical models and physical implementations. Products that sacrifice the logical model for various practical considerations (speed, size, cost, etc. - at least in the short term), reinforce the general lack of focus on, or understanding of, the relational model, as well as diminishing appreciation of the distinction betweenlogical and physical.
Physical data independence (PDI) is, indeed, a core advantage of the relational model, but hardly the only one I have focused on over the years. And the relational model is hardly the only component of the foundation knowledge that is increasingly lacking in the industry.


Indeed, what exactly are the "practical considerations" for which it makes sense  to "trade off logical design" for? That very idea is based on a misconception that I've I repeatedly debunked over 3+ decades.
The overall issue is analogous in some ways to the battles over the past 35 years or so between "high level" and "low level" languages. High level languages (e.g., remember when 4GL's like NOMAD, RAMIS and FOCUS first hit the scene?) attempted to let business solutions and processes be described with syntax that was meaningful to end users and business analysts. The general challenges in such approaches center on the lack of tools that can make the same kind of code optimizations that can be realized by use of low level languages in an expert's hands. It wasn't uncommon that time-critical code might have been written using Assembly Language rather than COBOL, Fortran or PL/1, even though this compromised the portability and reusability of the code and didn't conform to any of the conceptual models that higher level languages were attempting to enforce.
This perceived analogy is also derived from a misconception. The high-level/low-level language issue is a purely pragmatic one--there is no theory on which to base the preference of one over the other.

The relational model has a dual theoretical foundation that has proved itself over thousands of years: the equivalence of first order predicate logic and set theory. Whatever features an application development language offers, data access must
be based on some data model underlying the database and DBMS.  And as I demonstrate in Business Modeling for Database Design, no data model has satisfied the criteria of equivalence to, let alone superiority over the relational model, insofar as the practical implications are concerned.
I don't think there would be any complaint from the community of database professional if products (DBMS's and data manipulation languages) could be developed that faithfully implemented a relational model from the developer and user's perspectives, hiding all the translations to optimized physical implementation under the covers. Code and query optimizers are dramatically more effective than they were 25 years ago, but as folks in this thread have discussed and illustrated, there are still dramatic performance (and hence cost) improvements that can be gained by stepping away from a strict fundamentalist approach, considering that we are constrained to use the tools that are available today.
There are several problems with this argument
  • Actually, there have been a few TRDBMS implementations superior to SQL, e.g Ingres, or the more recent Alphora's Dataphor, yet very few recognized, appreciated or chose that superiority;
  • Why should vendors develop TRDBMS's if neither their designers, nor their users know, understand and appreciate without the necessary foundation knowledge due to operation in "cookbook mode"?
  • The absence of such knowledge induces database professionals to get it backwards: advantages are considered deficiencies. The very PDI mentioned by JS is a case in point: both vendors and users deplore it as a "failure of relational model [read: (read: SQL products, two other confusions) to incorporate physical aspects", when in reality that's the whole point--to give implementers complete physical freedom without disrupting users and applications;
  • Not even huge hardware and optimization improvements have produced attempts to do  better than SQL, quite the opposite: there has been a significant regression to pre-relational time, characterized by a proliferation of reinvented ad-hoc, proprietary products the flouting of relational fidelity by which proved cost-ineffective decades ago and which the relational model was invented to eliminate.
If read carefully, these points answer the following comment:
Correct me if I'm wrong (I have no doubt that you will!), but I believe that you've been criticizing the DBMS industry about the shortcomings of their products, related to these issues, for much of the past 25 years. To your knowledge, are there ANY efforts
in progress, anywhere, to realistically create a business-ready DBMS that allows users (both business users and developers) to implement, communicate with and query database using language syntax and conventions in conformance with RM, without performance and cost compromises that would render such a product uncompetitive in the marketplace? Until and unless there is some kind of economic viability for such a product, railing against the industry and practitioners is somewhat quixotic. Raising awareness, so that professionals understand "what's right" and what compromises they make and risks they take when deviating from an ideal model is excellent. Tilting at windmills is pointless.
But I will add the following: the efficient markets argument underlying the argument is actually the very kind of "fundamentalism" that JS doubts with respect to the relational model in the database world, but accepts rather readily in economics and business.

Economic theory tells us that a core pre-requisite for  market efficiency is perfect information by buyers and sellers. Foundation knowledge and learning from past experience is a crucial component of that information that  database professionals and users must possess. In its absence there is no reason to expect that the best product/technology will win, or that deficiencies will be addressed correctly over time.

Substituting tool training for education is what causes the current regress that is confused with progress.
My sense is that your priorities differ from many of the folks who must implement and manage databases within current business constraints ... I believe you'd reach, and be able to influence and inform, many more folks if you made more of an attempt to communicate empathetically. When the tone is that you know that you're right and that anyone who is bright enough and/or thinking correctly agrees with you, this is strikingly reminiscent of a religious cult leader. As much as you are respected, I believe you'd be a far greater influence if your approach could be more analogous to that of the Dalai Lama rather than Jim Jones.
One of my professors in graduate school, when told that he was right about something, would reply "But of course I am right, otherwise I would not have said it". He was very tough on students and colleagues, for which reason he was not very popular, but I never learned so much from any other professors, particularly not from the popular ones, as I did from him. He was almost always right, because he was relying on foundation knowledge. One reason I left academia is that there were so few like him.

As long as the industry operates the way it does, substituting tool training for education (see my forthcoming post at AllAnalytics, Training Is Not Education) the database market will not be efficient in the way JS suggests, regardless of my tone or style. In fact, viewing logic and math in database management as "purist", "fundamentalist", or "religion/cult" is sort of upside down and backwards: it's "faith" in specific tools and practices lacking a sound foundation that is akin to religion.

Under these circumstances it is more satisfying and effective to educate the few who are capable and willing to separate substance from style, than to strive to affect those who focus on the latter, because they are increasingly unprepared to deal with the former.

There's one related important point I would like to add: If it were the case that (1) most database professionals knew, understood and appreciated foundation knowledge and its practical implications (2) based whatever compromises they make on that it would be much different than the current state of affairs and, in which it is difficult to trust the necessity or value of those compromises.



No comments:

Post a Comment

View My Stats