As someone familiar with the TransRelational™ Model
(TRM), including as yet unpublished aspects, I feel compelled to respond to
Josh Hewitt’s misguided critique of TRM, The TransRelational Model: Performance
Concerns, posted on 11 November 2004, in the Google newsgroup
"comp.databases.theory” [Ed. Note:
There was an exchange in response to the critique, but both the critique and
exchange were deleted by persons unknown. Any inquiries into the critique will
be forwarded to Hewitt].
Mr. Hewitt's central difficulty stems from his lack of
knowledge of those aspects of TRM that are most critical to his analysis. The
centerpiece of Mr. Hewitt's critique rests on his contention that TRM likely
performs updates and disk access poorly. Yet his knowledge of TRM is derived
from sources that intentionally exclude coverage of how TRM operates
efficiently in a dynamic disk-based environment - notably Appendix A of
C. J. Date’s AN INTRODUCTION TO
DATABASE SYSTEMS, 8th Ed., and published
patents. Efficient update and disk operations represent crucial aspects of
TRM. But information regarding efficient dynamic disk-based implementations of
TRM is still available only under non-disclosure agreement, in works such as C.
J. Date’s as yet unpublished book manuscript GO FASTER! THE TRANSRELATIONAL
APPROACH TO DBMS IMPLEMENTATION [Ed. Note: See More on the “Final NULL in
the Coffin” for a list of some of the advantages of TRM].
Given Mr. Hewitt's lack of access to such material, how did
he arrive at the conclusion that TRM cannot perform updates and disk operations
efficiently? He created his own hypothetical dynamic disk-based implementation
of TRM, and proceeded to assess its performance. Upon finding that his imagined
implementation performed poorly, he concluded that TRM was ill-suited for
implementing a truly relational DBMS. In effect, he erected his own straw man,
and then proceeded to knock it down.
How did Mr. Hewitt create his straw man? He took the data
structures and algorithms he found in C.J. Date’s Appendix A, and
applied them directly to a dynamic disk-based RDBMS, ignoring Dr. Date's
explicit warning that these methods were designed only for use in a static
main-memory environment.
C.J. Date's Appendix A is tutorial in nature. In order
to get across some of the most basic concepts of TRM in a clear and concise
manner, Date deliberately ignores updates and secondary storage considerations.
He explains that Appendix A “adopts the fiction that the database is (a)
read-only and (b) in main memory.” But he is careful to point out that the full
TRM includes additional methods for handling disk access and updates, and warns
readers: “Please do not conclude that TR[M] is good for read-only, main-memory
databases only, however - it is not.” Unfortunately, Mr. Hewitt did just that.
[Ed. Note: See Date’s own reaction to
Hewitt, A Note on
the TransRelational™ Model].
It is important here to draw a clear line of distinction
between a model and an implementation. TRM, as described by Date in Appendix
A, is a model - in fact, it is a subset of the full TRM. As with any model,
there are many possible implementations of TRM. Mr. Hewitt’s imagined
implementation is one of these - a very poor one, by his own analysis.
In making the false assumptions that TRM uses the same
algorithms on disk as it uses in memory, and that it provides no new methods
for handling updates, Mr. Hewitt ignores the possibility of inventions beyond
those already published i.e. further complementary ideas that produce major
improvements in disk and update performance. This brings to mind the error made
by Rolls Royce rocket engineers in the late 1940’s, who proved to themselves
conclusively that it was impossible to launch a craft into space using a rocket
engine, and thus that spaceflight was impossible. Although their calculations
were correct, their conclusion was obviously wrong. Their mistake was to not
conceive the possibility of a companion invention—in that case the idea of a
multi-stage rocket—which neatly solved the problem.
Mr. Hewitt’s analysis of TRM is analogous to the Rolls Royce
analysis of a single-stage rocket. He was clearly handicapped by his lack of
access to unpublished materials. Yet his reluctance to accept the possibility
of companion inventions is especially odd in light of C.J. Date's explicit
pronouncements. And it would not have been unreasonable for Mr. Hewitt to have
asked whether an implementation of TRM had ever been built, and if so, if it
had been performance-tested. From some of C.J. Date's statements in Appendix
A (e.g. "the solutions have been implemented", and “Performance
in TR[M] is orders of magnitude better than it is with a direct-image system”),
one might assume that C.J. Date had seen a TRM implementation and had been able
to assess its performance as, in fact, he had.
Not only had Date been exposed to a working TRM implementation—a
prototype built by Required Technologies
that included update and disk operations—but so have other highly respected
database researchers and implementers. Moreover, several potential customers ran
their own benchmarks against this prototype using their own real-world data and
their own live complex queries. The results were extraordinary. In every case,
TRM delivered orders-of-magnitude performance improvements over existing
RDBMSs, in a large dynamic disk-based environment. These results can be
demonstrated to anyone seriously interested in TRM.
While drawing attention to TRM's ability to provide
blindingly fast performance in a dynamic disk-based environment, Date was not
at liberty to disclose the complementary inventions that enabled this outcome.
Given the existing non-disclosure agreements, the most that can currently be
said is that the full TRM includes a variety of novel algorithms and data
structures for handling dynamic disk-resident data, and that these typically
result in database operations executing at or near main memory speeds, in a
large dynamic disk-based environment, a fact fully consistent with the existing
TRM benchmarks.
Unlike the TRM prototype, Mr. Hewitt’s imagined implementation
runs into serious problems relating to disk splaying and slow binary searches.
Date specifically references these issues in Appendix A, where he
writes: “In a disk-based system, will not the zigzagging mean a lot of random
access and terrible performance? And can binary search be made to work
efficiently on the disk?” Constrained from disclosing confidential information,
C.J. Date continues: “Such questions are indeed serious ones, but (sadly) this
is not the place to address them; suffice it to say that they can be and have
been addressed satisfactorily, and the solutions have been implemented.” I can
only confirm that I too have been exposed to methods that neatly solve the
splaying and binary search problems referenced by Mr. Hewitt. The result is
performance far faster than B-trees, without the negative consequences B-trees
entail, as can be verified by the TRM benchmarks.
Mr. Hewitt’s critique is inherently flawed. His arguments are
only applicable to his incorrectly imagined implementation, which indeed
performs poorly. Unfortunately, this leads him to the terribly erroneous
conclusion that (in his words) “In its current form…TRM is not a feasible
candidate for implementing a relational DBMS.”
Compare this conclusion to C.J. Date's. Unlike Mr. Hewitt,
Dr. Date had the benefit of a detailed knowledge of the full TRM, including the
algorithms and data structures that TRM uses to handle updates and disk access.
And Date was also able to witness the performance of a TRM prototype, which
confirmed the effectiveness of the TRM design.
Date’s conclusion was that TRM “is likely to prove the most significant
development in this field since Codd gave us the relational model” and that TRM
“allows us to build DBMSs that - at last! - truly deliver on the full promise
of the relational model.”
However much I would love to reveal how TRM algorithms and
data structures operate in a large dynamic disk-resident system, I regret I
cannot do so at the present time. These methods are the intellectual property
of Required Technologies, and, due to their extremely high value, can only be
disclosed with adequate safeguards in place. Yet, I have some limited good news
I can share.
Not only does the prototype implementation of TRM (referenced
above) still exist, but also a full-blown commercial disk-based updatable RDBMS
based on TRM (with standard SQL, ODBC, JDBC, and third-party tool interfaces,
plus all standard subsystems) is nearly complete. But for certain business
difficulties, this product would already be available. Preliminary performance
testing has been most impressive. And steps are currently being taken to
resolve the business difficulties, to enable completion of the product - and
hopefully the publication of C.J. Date’s manuscript. I greatly hope that this
comes about in the relatively near future. All will then be able to verify to
their own satisfaction just how greatly TRM outperforms current-generation
RDBMSs in a fully updatable disk-based environment.
Posted 1/28/05