ON DATA TYPES
with Hugh Darwen

 

 

 

Ed. Note: I numbered the paragraphs for ease of responding without requoting.

 

 

From: JB

To: Editor

Date: 5 Sep 2003

 

1.

After reading THE THIRD MANIFESTO, I have been thinking about possible representations and system defined types. In your writings you often make use of the domains Integer and Character as basic system defined types. Particularly, you present the idea that the Character domain should not be a type generator which accepts a fixed length as a parameter. This seems fairly reasonable to me and useful in practice. I have not noticed any particular mention of your opinion as to whether the Integer domain should be infinite.

Since the Integer domain in mathematics allows integers of arbitrarily large magnitude, I think the Relational Model of Data should allow this as well.

 

2.

You state in your writings that the only domain which must be defined by a system implementing the Relational Model of Data is the truth value or Boolean domain. Further, in several writings you have stated that the two values of the Boolean domain, TRUE and FALSE, may be represented as the two relations TABLE_DEE (degree zero, cardinality one) and TABLE_DUM (degree zero, cardinality zero). Doesn't this mean that a Boolean domain need not be defined by the system, because by defining relations such a system inherently defines TABLE_DUM and TABLE_DEE and thus implicitly defines the concepts of TRUE and

FALSE?

 

3.

Note: I have seen some resources which state the TABLE_DUM has any degree and cardinality zero. I'm going to ignore that (in my opinion bad) definition so that I can be sure that the statement "TABLE_DUM UNION TABLE_DEE" is equivalent to the statement "FALSE OR TRUE." Please let me know, am I missing something? Should TABLE_DUM be a range of relation values, rather than the specific relation value I mentioned above?

 

4.

Let me define DEEDUM as the relation type of TABLE_DEE and TABLE_DUM, with header containing no attributes. Let me define a new relation type called BYTE which has eight relation valued attributes: bit0, bit1, bit2, bit3, bit4, bit5, bit6, and bit7. The relation type (header) of each of these relation-valued attributes is DEEDUM. Thus, each attribute can have the relation value TABLE_DEE or TABLE_DUM. I can then say there are 256 possible tuples which might be in the body of this relation. If I were to define a relvar of type BYTE with the constraint that its cardinality must be exactly one, I have the equivalent of a variable that holds 8 bit bytes. I can define several operations which apply to cardinality one BYTE relation values (addition and subtraction, for instance). Clearly, I need not consider the possible tuples of this relation type as numbers; I could call them characters instead and define some other operators appropriate to that interpretation.

 

5.

I can't figure out how to define a relvar in this manner that could contain an arbitrarily large integer (or an arbitrarily long character string). I can define a multiplication operation which would take two cardinality one BYTE relation values and return a single cardinality one WORD relation value. (Relation type WORD would have 16 attributes, bit0 through bit15, that are relation values of type DEEDUM). But, because attributes are unordered in the header, I have to name the attributes rather than specifying them in some expandable way (such as by position). I can't create some generic operator that is able to multiply any two cardinality one relation values that have headers consisting only of DEEDUM valued attributes because I can't be sure which attribute name would represent the most significant bit. My current naming scheme is good for clarity, but a generic operator would have to be able to handle a relation header with names like Red, Green and Blue equally as well as bit2, bit1 and bit0.

 

6.

This leads me to believe that machine representations of domains must always be finite. In your book TEMPORAL DATA AND THE RELATIONAL MODEL, you state that any interval domain must be defined on a scale which has fixed end points. You make logical arguments about how a time interval must be defined on some range of dates with a definite beginning and ending date.

 

7.

Of course, physical representations will be finite. I'm trying to avoid physical issues because I'm considering the Relation Model of Data, which exists purely on the logical level. My objective is to show that using the Relational Model of Data, all data can be represented just by relation values, and that the attributes of relations may be restricted to being only other relation values. My problem is that in the realm of mathematics, we are able to consider infinite domains, but I have not figured out how to logically represent an infinite domain using only relation values.

 

8.

What you think about finite versus infinite domain definitions?

 

 

Hugh Darwen Responds:

 

1. Do we really say that CHAR should *not* be a type generator?  I don't think I would advocate it, but perhaps it's doable.  And the parameter could just as well be for a maximum length as for a "fixed" one.  Another approach would be to regard CHAR(23), for example, as a subtype of CHAR, where CHAR is all character strings and CHAR(23) is strings no longer than 23, or possibly strings of length 23.  Anyway, how character types are (or the character type is) defined is not an issue in THE THIRD MANIFESTO . I suspect the reference is really to Tutorial D, which is merely a notation we use for illustrative purposes.

 

We do not consider infinite types at all, nor do we consider infinite relations.  Some people think that infinite types should be supported, but we have not seen any rigorous treatment of that concept as an extension to Codd's Relational Model, nor have we been motivated to explore the idea ourselves.  At the very least there would be an implication for the definition of the aggregate operators MAX and MIN on the empty set, but I suspect CJD would point out several other difficulties.

 

2. No, I think TRUE and FALSE come first.  TABLE_DEE and TABLE_DUM are relations, not truth-values.  A relation represents an extension of a predicate.  TABLE_DEE and TABLE_DUM are the only possible extensions of a predicate in which there are no free variables.  "The King of France is bald" is one such predicate.  If, when considered as a proposition, it is true, then the corresponding relation is TABLE_DEE; otherwise, it is TABLE_DUM.  It seems I need the concept of the truth of a proposition before I can encounter the TABLE_ twins.  Another such predicate is "There exists a supplier S# and there exists a part P# such that S# supplies P#".  The corresponding relation would be given by SP{ } (the projection of SP over no attributes).  SP{ } evaluates to either TABLE_DEE or TABLE_DUM, according to the existence or not of at least one tuple in SP.

 

3. It was I who introduced TABLE_DEE and TABLE_DUM to the database community, in the article that became the chapter of that name in RELATIONAL DATABASE WRITINGS, 1985-1989, so I can answer this one with some authority!  TABLE_DEE has one tuple, no attributes; TABLE_DUM is even more impoverished than TABLE_DEE.

 

4. Surely not "the equivalent"?  I mean, the assignment BYTE := B'10101010' and the comparison BYTE = '10101010' are surely illegal?  They would be in a D.

 

You seem to condone overloading arithmetic operators to apply to bit strings as well as numbers.  I wouldn't be in favour of that, personally, though we cannot legislate against overloading.

 

5. I think this is based on assumptions I have already disputed, so I thereby (at least) dispute it.

 

6. Yes, [finite] and therefore "domains" are.  (I'd love to know why you reject our term "type".)

 

Yes, except for interval types defined on cyclic point types, which have a single "origin" point (such as Monday, or midnight) rather than two end points.

 

7. We require every value to be physically representable.  If the number of physical representations is finite, then so must be the type.

 

8. I answered this one earlier.

 

 

Posted 03/26/04