ON JIM GRAY'S "CALL TO ARMS"
by Lee Fesperman

 

 

 

In April, 2005, Jim Gray with Mark Compton published A Call to Arms inACM Databases, Vol. 3, No. 3, yet another “manifesto” on the direction of database technology. Gray's basic premise is that “traditional relational” database management is sagging under the onslaught of modern computing challenges. But what he is really indicting are SQL DBMSs by the major vendors—Oracle, Microsoft (SQL Server) and IBM (DB2). He appeals to the prejudices of many in the information technology field, but provides little hard evidence in support of his arguments. He uses simplistic anecdotes and distorts history and current reality to support his premise. Gray bundles together a plethora of current practices and techniques, and intimates that all should be integrated into database systems.

 

This response examines a number of his points, beginning with his indictment of relational technology, and attempts to separate reality from opinion. The task is daunting, because the article tries to encompass most aspects of current computing techniques, not just those who clearly belong in the database management field. Rather than cover all his proposals, this article will concentrate on several salient ones, while touching on a few other topics.

 

Is Relational Becoming Irrelevant?

 

Is the relational database architecture—as epitomized by SQL DBMSs—really sagging at the knees? Or, are the major SQL vendors not keeping their eye on the ball?

 

The major SQL DBMS vendors virtually ignore relational principles. SQL itself is notorious for weaknesses in its relational fidelity, e.g. weak integrity support, optional primary keys, lack of true domain support, and so on (see A GUIDE TO THE SQL STANDARD). SQL92 did make some improvements in relational compliance over SQL89 and SQL86 (but didn't fix the above problems.) However, SQL99 and later standards made a sharp turn away from relational concepts in introducing features that violate the relational model (RM), e.g. pointers.

 

Open source SQL DBMSs are even worse, almost defiant about non-conformance. Rather than "polishing the round ball" (in the words of Michael Stonebraker), vendors have concentrated on marketing differentiation, non-database features, and quirky optimizers requiring vendor-specific tuning that locks customers into their proprietary solutions. Improving relational compliance is the lowest priority, or is often dismissed outright.

 

Yet full or even just better compliance with RM offers true solutions to many of the issues that Gray raises. He disregards this, treating weaknesses of current products as evidence of the failings of RM which they failed to implement. He offers XML and XQuery as alternatives, supposedly based on 'developer' preferences. This is a rash judgment, supported by neither past, nor current experience.

 

XML and XQuery are insolubly flawed, the former being just another incarnation of the discredited hierarchical data model that RM superseded. XQuery brings the excessive complexity that hierarchy instilled in the products that preceded SQL. A relational query language uses only one method to access and manipulate information—by data values. XQuery has at least three—by value, by navigating the hierarchy and through external linkage. This adds complexity, but no power, exactly the problem that Codd intended to eliminate with RM.

 

XML is may be appropriate for data transfer between systems, for streaming operations (although it was not necessary, nor the most efficient for that purpose). But XML is not a good data management technology (see The Exchange Tail and the Management Dog).

 

 

Dichotomy Between Data and Procedures

 

Gray claims that there is an “artificial separation” between data and procedures (or later in his article, algorithms) in 'traditional' relational databases. He asserts that this dichotomy began with the COBOL language and the COBOL DBTG (Database Task Group). But this distorts history and current reality. RM has nothing to do with COBOL, and actually predates DBTG. It is not “pure data”; it defines powerful set operators for manipulating data using relational algebra and calculus languages, which are declarative rather than procedural (SQL is an improvement over languages preceding it, but it is not always as declarative as it should and could have been).

 

Perhaps Gray thinks that RM rejects procedural code for most database processing, and requires declarative interfaces. But there are very good reasons for this stance. Procedural code is convoluted and obtuse. It is difficult for humans to verify correctness, or even to understand its purpose. Machine verification of procedural code is still not possible, and is not likely to occur in the near future.

 

Another major problem with procedural code is that optimization techniques yield limited gains in efficiency, even after years of research in this area. Functional, logic and declarative code allow much greater improvements. An optimizer for a declarative language can yield multiple levels of magnitude improvement. Correctness is easier to verify.

 

Note: The major Object-Oriented Languages, Java, C++, etc., are procedural in nature.

 

For these reasons, procedural code is generally a poor choice for binding with databases. Even so, most SQL DBMSs support stored procedures and triggers written in procedural languages. At least some of the impetus for adding them is due to the weakness of SQL as a full declarative, relational language. Gray thinks that we've polished the SQL round ball enough and looks to OO and XML for solutions, certainly a popular view (more on this later). But if we are must discard SQL altogether—which may be a good idea on the one hand, but difficult practically on the other—why not develop a good truly relational data language?

 

Procedural needs can be satisfied by host application development languages, as Codd envisaged with his concept of an embedded data sub-language; SQL is just a poor implementation of that idea.

 

 

Synergy Between Objects and Database

 

OO programming is widely used today, and RM is the dominant database technology, at least insofar as SQL adheres to it. Together, they power many applications in use. However, many consider this marriage to be in trouble. They describe the problem as an “impedance mismatch” between OO applications and relational—that is, SQL—database systems. They usually attribute this mismatch to the difference in the way information is represented in OO—as pointer-linked lattices of objects, and the relational representation of information, as logically linked tables, based on data values.

 

One reason for this so-called mismatch is the fundamental distinction between the concerns of an individual application and those of a shared database system. An application works within a subset of the total problem space. It accesses and manipulates a privately viewed subset of a database. A database system, on the other hand, concerns itself with serving multiple applications with different views of the database, and maintaining the security, integrity and accessibility of the shared data; a database and DBMS service a variety of reporting capabilities, batch processing and ad-hoc access, all using different subsets of the same data in differentand changing—ways.

 

Gray touts SQLJ (an OO style embedded SQL) as a “nice” integration of SQL and Java. The reality is that SQLJ is not so nice, and is rarely used. Most OO applications either access the database API and use the values directly, construct their own objects, or use Object/Relational (OR) wrappers like Hibernate for Java. All this occurs on the client, so the DBMS backend is not involved. Unfortunately, the special requirements of OR wrappers sometimes drive the design of the database, to the detriment of general usability of the shared resource.

 

The modern OO languages, such as Java and C#, compile to an intermediate form which then is compiled to machine code at runtime. Aside from full portability and greater efficiency, this makes them host language candidates for embedding a truly relational data language. A runtime compiler can analyze usage patterns for improved optimization and can recompile portions dynamically as it gathers newer statistics. These languages also offer greater security. The runtime environment employs a verifier for the intermediate code to protect against malicious code or inadvertent corruption. The compiled code runs in a 'sandbox' that prevents interference with other parts of the DBMS.

 

 

Other Topics

 

This section briefly looks at other areas in Gray's article. Most, but not all, show promise in improving current database systems, but are orthogonal to RM.

 

Self-Managing and Always-Up Database Systems

 

Like many of the concepts mentioned in Gray's article, self-managing and always-up database systems already exist with these characteristics, proving again that Gray is overstating his case in claiming relational database architectures are “slowly sagging to their knees”. In fact, truly relational DBMSs (TRDBMS) are uniquely able to deal with these situations, because of their physical data independence. Additional information on this topic can be found in Highly Available Databases, (the major SQL DBMSs have limited capabilities in this area).

 

Improved Physical Structures

 

RM and even the SQL Standard are logical in nature, and silent about physical details This allows product designers complete physical freedom in implementation. Unfortunately, current SQL DBMSs tend to use simple storage structures and access paths (primarily direct-image storage, indexing, hashing and sorting). They use random access disks as the storage medium, converting data to octets (8 bit bytes) using serialization. Random access and serialization can be expensive, especially for objects.

 

There is much room for improvement here. In-memory storage is a promising technique supported by some newer RDBMSs. In-memory structures are direct access and can avoid conversion/serialization to achieve high-performance processing. This is especially important for systems with inefficient or no random access storage.

 

Active Databases and Publish/Subscribe Capabilities

 

A system that manages active databases keeps clients up-to-date about any changes to queried data as they occur. For example, the client sends a query to the database selecting and retrieving a set of data. As long as the client keeps the query open, it will receive notification of any change (Insert, Update, Delete) made to the backend database. A publish/subscribe facility provides a similar capability--retrieving an initial data set and then receiving notifications of changes to the set. SQL implementations of active databases (e.g. via event triggers) have existed since the '90s.

 

Federated Databases

 

A federated database system—a new name for distributed DBMS—combines two or more distinct servers under a single interface. A common example of their use is a single query that joins tables from different databases. The problem with federated databases is that they are not efficient in the general case, because only limited optimization is possible.


If each participating database system is from the same vendor (homogeneous databases), they can communicate to achieve some reasonable optimization plan. However, this isn't feasible for heterogeneous databases. A standard optimization dialogue/protocol to support this is prohibitive, because of divergent proprietary implementations. An efficient federation of heterogeneous databases is a pipe dream. A number of implementations have been developed e.g. R*Star at IBM, Ingres*Star, but did not fare well for that reason.

 

Evolving Schemas

 

Gray is using an OO mindset when referring to schema evolution, also an essential component in the Extreme Programming (XP) paradigm. Because OO lacks a data model akin to RM, constant restructuring of the object design and application maintenance becomes a prohibitive necessity.

 

Schema changes are much less common in a relational environment. Relational and normalization principles, when adhered to, create flexible designs that minimize integrity and evolution burdens. Should structural changes become necessary, normalized designs and relational views can ease the transition and limit the undesirable effects of the change. Some applications may require no changes at all. Indeed, that was one of the major objectives of RM.

 

 

Lee Fesperman is a database practitioner who has worked with relational technology for over 30 years. He has developed a number of database systems based on the relational model and helped start DATABASE DEBUNKINGS. Lee is currently president and CEO of FFE Software, Inc. (www.firstsql.com) and can be reached at lee.at.firstsql.com.

FFE Software, Inc. is developing database systems. The flagship product is  FirstSQL/J ORDBMS, which supports SQL92 with relational extensions. The product features some capabilities mentioned in this article—self-managing and always-up databases, and in-memory processing. It also provides complex domains, stored procedures, triggers and expression functions implemented in Java; this is detailed in An ORDBMS that is Truly Relational (forthcoming at www.firstsql.com).

 

 

Ed. Note: See also Date and Darwen’s assessments of Gray’s piece (which were way too mild for my taste) at  The Third Manifesto. My own critique, A Call to Reason will now be forthcoming elsewhere or here, now that my DBAzine column was canceled.

 

 

Posted: 8/26/05