Saturday, May 9, 2015

On OO Relational "Extensions"



In a LinkedIn thread that followed my Comments On Stonebraker Interview, Erwin Smout mentioned David Maier's 1991 critique of the 1990 Third Generation Data Base System Manifesto (3GM), of which Stonebraker was one of the authors. I was aware of the 3GM, of course, but had not read it because, at the time, it did not benefit from favorable reviews. I considered The Third Manifesto by Date and Darwen more significant, in part because it was authored by relational experts and because it was backed up by a proposed fully computational language with a fully relational component. But when Erwin mentioned Maier's piece, I asked him if he had a copy and he found a scanned PDF copy online.

Having not read the 3GM, I am not in a position to comment on Maier's critique thereof, but I would like to comment on the general topics in his Preliminaries that attracted my attention.


Here's the first paragraph that made Erwin smile:
It is unclear whether the 3GM is intended as a definition of a new class of DBMS's, as a prognostication, as a research and development agenda or as a marketing piece. It may be something of each, judging from the arguments in it. Some are on semantic and engineering grounds, but other are based on perceived customer demands or market forces.
This was, essentially, also the gist of my comments on Stonebraker's pronouncements in the interview.
Most disturbing is an undercurrent of implication that requirements for next generation database systems should be tempered by what are compatible extensions to current relational models and technology. The message I read is that relational systems, or their slight extensions, are the "end of history" as far as database systems go. The bottom line is that there should be one flavor of next generation database system and that flavor should be extended relational.
Well, if Maier's impression is accurate, then Stonebraker's opinion has certainly changed, as he has kept criticizing "relational systems"--Oracle in particular--as obsolete for quite a while. He was, of course, referring to SQL systems--the closest the industry has ever come to the RDM. Now, far from me defending SQL systems, but my criticism was precisely of his failure to stress the crucial differences between RDM and SQL. While SQL may well be, in some sense, obsolete, the RDM hardly is. The differences are crucial because SQL, as pushed by then powerful IBM and later by Oracle, is a botched concretization of the RDM, but so entrenched that it not only inhibited the development and implementation of truly relational data languages and TRDBMS's, but also prevents them. SQL, warts and all, is so much associated with the RDM, that the realization that the latter is the solution to the problems of the former is, to understate the case, very unlikely. Hence the return to a proliferation of ad-hoc, proprietary DBMS's, the situation we had before RDM and SQL.

Stonebraker's criticism of "relational systems" is mainly on performance (which in my Comments I deemed "marketing", just as Maier deemed 3GM). One can criticize a specific implementation of the RDM for poor performance, but not its relational fidelity itself. SQL vendors simply failed to implement full physical data independence (PDI) implicit in the RDM. In fact, if SQL DBMS's have performance problems, many of them have nothing to do with their being relational, but with not being relational enough!

Incidentally, there were no "relational models and technology" in 1991 and there aren't any now, only one RDM and SQL technology. Slightly differing interpretations of elements of the RDM are  not the same thing.

There is a tone in the 3GM of "if it can't be added easily to current [SQL] systems, it must be wrong" and that data models should only evolve if the current implementations can evolve along with them. We shouldn't abandon the successes of the relational model lightly, but they shouldn't bind us from exploring new territory. Columbus had to sail out of sight of land to find the New World. Even if third-generation systems end up looking a lot like their parents, I doubt database technology is best advanced by research that limits itself to relational extensions or is dictated by current practice. Research should be unfettered by the current state of affairs, in order to foster the most diversity in new models and implementation technology.
I could not agree more (even though I doubt database research is comparable to Columbus search for the New World), but...

I am not sure that the RDM requires extensions, as we shall shortly see--FOPL and set theory have been with us for over 2000 years and extending them is an extremely tall order to say the least--but if it does, evolving it will prove more productive than replacing it. Yet "abandon it lightly" we did, without even really implementing it properly first. For all practical purposes, there are currently no relational research or implementation efforts--academia prefers to follow industry fads.

Note: There was one serious commercial attempt to implement Date's and Darwen's Tutorial D, but it went nowhere despite having been found much superior to SQL by those who tried it. David McGoveran and myself, independently came up with an idea for a relational solution for missing data, which would be an excellent topic for a PhD thesis--researching the implications for manipulation and integrity enforcement and feasibility--but there is no interest.
 

That is precisely why what is being researched and implemented as purported improvement on SQL technology is not even "new territory", but  what the RDM made obsolete decades ago--graph systems; and "schemaless" NoSQL systems that, according to Stonebraker, are becoming, predictably, more SQL-like. That's regress, not progress.

A recent article nicely demonstrates that, as is usually the case, the new features NoSQL technology offers not only can be all readily accommodated within the RDM, but a TRDBMS is the ideal setting, as the RDM's PDI was intended to facilitate them (more on that in a future post). That SQL implementors failed to implement them is hardly a RDM limitation which requires "relational extensions".

And, in fact, that is also true of the OODBMS features that Maier "would not see as relational extensions anytime soon":

  • [type] inheritance; 
  • type extensibilit;
  • method attachment;
  • recursive complex objects.
There is absolutely nothing in the RDM that precludes these in TRDBMS's, but they are not data model/RDM extensions. For example, Date is on the record that about the only thing OO technology adds to the RDM is domain inheritance (types are domains in RDM), but the domain system is completely orthogonal to the RDM. Moreover, a TRDBMS will support any extensible user-defined domain of arbitrary complexity as long as it is  implemented with proper operators. That SQL DBMS's don't has nothing to do with RDM, but with some non-trivial extensibility implementation issues that have nothing to do with the data model (see Domains: The Database Glue).

When I argued in my writings more than once that OO is a programming, not data paradigm, that programmers see a DBMS as just another application and that OODBMS's seem like DBMS building kits, I got a lot of protestations and flak.  Well, here's Maier--you be the judge:

The 3GM make scant distinction among types, collections and named values. In modern programming languages, defining a type, creating a collection of instances of that type, and declaring a variable to hold such a collection are separate activities. The 3GM assumes the [SQL] status quo, where all three are lumped together. Adding a relation to a database scheme defines the tuple type, the set type over that tuple type (a relation type), creates an instance of the set type and assings that instance to a variable (the relation name). The tuple type, the set type and the variable are all lumped together with a single name (e.g. employee).
I would say that in this respect even SQL is progress, wouldn't you? Which is probably why OODBMS's did not fulfill their hyped up expectations.
 

Before we "extend" FOPL and set theory with OO features can we get a specification--precisely, please!--of the "object data model" (ODM) structure, manipulation and integrity features?



No comments:

Post a Comment

View My Stats