MORE ON NULLS
with Fabian Pascal

 

 

 

From: Curt Sampson

To: Editor

Date: 26 Feb 2005

 

1. In NULLS Nullified, you write:

 

    ...there is a commercial product with a language based on the principles advanced in The Third Manifesto: Dataphor by Alphora. It is interesting to note that the company refrained from implementing NULLs, in order to avoid the problems haunting SQL, but recently had to give in to market pressure and added them....

 

This ties in with some recent thoughts I was having on support for user-defined types. I consider one of the big problems with current DBMSs to be their poor support for UDTs; I think that that is one of the

reasons that OO folks, used to rich facilities for type definition, tend to use RDBMSs poorly.

 

2. Reading THE THIRD MANIFESTO, as well, the very first RM prescription states that, "D shall provide facilities for users to define their own scalar types," and the third one states that, "D shall provide facilities for users to define their own scalar operators."

 

Does Dataphor not provide this?

 

3. If so, I'm not sure what "adding NULLs" means, since any system that lets you define your own types would let you define a type with a domain of, say, true, false or "NULL", or all integers plus "NULL", and define appropriate operators for them.

 

4. This may be a slight disagreement between us; I'm not in favour of abolishing the capability of having NULLs the way we have them in SQL; to do so would be to put an arbitrary limit on users' abilities to define types. However, I'm quite strongly in agreement with you that the system defined types in SQL include NULL (in SQL, I consider columns "foo int NOT NULL" and "foo int NULL" to be distinct types—they are certainly different domains!) are poorly designed and one would be ill-advised to use them. And I would certainly advocate not including such types in the basic system. User defined types may give users enough rope to hang themselves with, but when you buy an RDBMS, I see no reason to include a noose in the package.

 

5. To continue, I see nothing wrong with being able to define a type consisting of, say, all integers plus "unknown" (UNK), and use that for, say, an attribute giving a count of the number of residents of a dwelling. The great advantage here is that you have a type where this value says what it means explicitly (it can't be misinterpreted as "nobody," or "not applicable"), and you are forced to correctly define operators based on the real semantics of the situation. For example, when you first try to apply count() to that attribute in a relation, the system will return an error message stating that count() is not defined for that type, and you can sit down and decide just what count() should do. Perhaps it should return UNK if any tuple has UNK in that attribute. Perhaps you really want a minimum_count() or count_known() function instead. Perhaps you even want "count(residents assuming UNK = average(residents))"

 

Is this any less sensible a way of dealing with this than Darwen's proposal to move that attribute into a couple of separate relations? It certainly makes the constraints easier to define.

 

Should a DBMS offer this sort of facility, make it easily enough available to users, and properly document and promote it, I would think that the demand for support for all of SQL's data types that include NULL would be much, much less.

 

 

From: Fabian Pascal

To: Curt Sampson

 

·        Agreed.

 

·        As far as I know, yes. Check with them for details.

 

·        Disagree. No connection.

 

·        Strong disagreement. See The Final NULL in the Coffin.

 

·        Disagree. See the paper.

 

·         I do not subscribe to Hugh's approach. See On Darwen’s Handling of Missing Information Without Using NULLs

·         Disagree.

 

 

Posted 4/15/05