MORE ON THE “FINAL NULL IN THE COFFIN”
with Fabian Pascal

 

 

 

From: SH

To: Editor

Date: 9 Nov 2004

 

Now back from the trip. I read your PRACTICAL DATABASE FOUNDATIONS paper #8, The Final NULL in the Coffin, at 30,000 feet, over Siberia, and despite the jetlag I think I understand (though am not totally convinced yet!) your solution to the NULL problem - by avoiding entirely the need for NULLs of any sort.

 

Haven't yet worked through all the implications, but I have one immediate question. If I want to obtain a list of all employees for whom the salary is unknown, how would I do this. I understand that the data needed to answer this query are actually in the metadata table of 'missing items', but presumably I can't use the 'IS NULL' syntax ?

 

Also, if actually storing the data in a (non-relvar) table, what do you put into the spaces which would normally be occupied by a 'null' code? I guess this doesn't really matter, because when you're looking at the data relationally, those locations will never even be accessed. However - the next thing that I want to do is to work through the implications of keeping the table, nulls and all, but forcing a 2VL logic by making every operation with a null operand return a 'false' value. Will this give the same results in practice as your theoretically correct multi-relvar solution, or would it behave differently ? I recognize the implementation questions (e.g. null being represented by the empty string) but it seems to me this is purely a feature of bad design of SQL and would be easily avoidable in a new system. If you have a 'null' it should be physically different from and not confusable with ANY valid data value.

 

Finally - intrigued by the Trans-Relational implementation model. Unfortunately you don't give enough details of it for me to understand how it would work. Is there anything published on it yet ? The 'TM' is worrying. I hope the whole thing is not going to be proprietary and patented - because that would do a great disservice to the relational world, and would seriously inhibit its take-up by developers. It seems to me something that ought to be made as widely available as possible. I understand the commercial constraints and pressures, but there surely must be some way around these (like selling advance copies to Oracle or Microsoft to allow them a few months lead on the rest of us!).

 

 

From: Fabian Pascal

To: SH

 

It's not going to be convincing until the details of the effects on all relational operations are spelled out by research. But we believe the underlying principle is sound: operating on parts of a  relation that are relations should yield logically correct results.

 

There are no NULLs anywhere in the system. The paper does say that queries about missing data are applied to the pertinent system catalog relations. You can figure out from the catalog relation example in the paper how to formulate the kind of query you refer to. In fact, if I am not mistaken, I think there is an example of a query.

 

There are no stored tables; the system translates any such implied table into multiple relations. So there is no space for "NULL code". It's not just a matter of looking at data relationally, internally everything is relational too; that's the whole point. Tables—distinct from relations--exists only as an application output format (and perhaps in the minds of users), but nowhere else.

 

Furthermore, in the TRM implementation data is not stored as files that are images of base tables, as in SQL. All tables are virtual [and the physical storage is completely different].

 

Don't understand your question.

 

Unfortunately, no. There is a manuscript ready for publication, but there is no permission yet. A bit more information is in Date’s AN INTRODUCTION TO DATABASE SYSTEMS 8th Ed. We hope that a product will be available in the foreseeable future, in which case we will publish as much as we can.

 

 

Ed. Note: In the meantime, here’s a short list of some of the benefits from C. J. Date’s seminar on the subject:

 

The TransRelational Model … provides a totally new approach to implementation, one that is dramatically different from those that have been tried in the past and found wanting (including all of the approaches encountered in today's SQL products).  Such an implementation would be orders of magnitude faster than, and would deliver a far greater degree of data independence than, today's SQL products.  It would also greatly simplify the job of the DBA. 

 

Here are a few more specific features of the TransRelational approach: 

 

·   The data is effectively stored in many different sort orders at the same time. 

 

·   The data occupies a fraction of the space required for a conventional database today. 

 

·   Indexes and other conventional access paths are completely unnecessary. 

 

·   Optimization is much simpler than it is with conventional systems; often, there is just one obviously best way to implement any given relational operation.  In particular, the need for cost-based optimizing is eliminated. 

 

·   There is no need to compile database requests ahead of time for performance. 

 

·   Join performance is linear!, meaning, in effect, that the time it takes to join twenty relations is only twice the time it takes to join ten (loosely speaking).  It also means that joining twenty relations, if necessary, is feasible in the first place; in other words, the system is scalable

 

·   Logical design can be done properly (in particular, there is never any need to "denormalize for performance"). 

 

·   Physical database design can be completely automated. 

 

·   Database reorganization as conventionally understood is completely unnecessary. 

 

·   In general, far fewer human decisions are needed. 

 

In a nutshell, the TransRelational model allows us to build DBMSs that at last! truly deliver on the full promise of the relational model.

 

 

Posted 1/14/05