Ed. Note: I
numbered the paragraphs for ease of responding without requoting.
From: JB
To: Editor
Date: 5 Sep 2003
1.
After reading THE THIRD MANIFESTO,
I have been thinking about possible representations and system defined types.
In your writings you often make use of the domains Integer and Character as
basic system defined types. Particularly, you present the idea that the
Character domain should not be a type generator which accepts a fixed length as
a parameter. This seems fairly reasonable to me and useful in practice. I have
not noticed any particular mention of your opinion as to whether the Integer
domain should be infinite.
Since the Integer domain in mathematics allows integers of
arbitrarily large magnitude, I think the Relational Model of Data should allow
this as well.
2.
You state in your writings that the only domain which must be
defined by a system implementing the Relational Model of Data is the truth
value or Boolean domain. Further, in several writings you have stated that the
two values of the Boolean domain, TRUE and FALSE, may be represented as the two
relations TABLE_DEE (degree zero, cardinality one) and TABLE_DUM (degree zero,
cardinality zero). Doesn't this mean that a Boolean domain need not be defined
by the system, because by defining relations such a system inherently defines
TABLE_DUM and TABLE_DEE and thus implicitly defines the concepts of TRUE and
FALSE?
3.
Note: I have seen some resources which state the TABLE_DUM
has any degree and cardinality zero. I'm going to ignore that (in my opinion
bad) definition so that I can be sure that the statement "TABLE_DUM UNION
TABLE_DEE" is equivalent to the statement "FALSE OR TRUE."
Please let me know, am I missing something? Should TABLE_DUM be a range of
relation values, rather than the specific relation value I mentioned above?
4.
Let me define DEEDUM as the relation type of TABLE_DEE and
TABLE_DUM, with header containing no attributes. Let me define a new relation
type called BYTE which has eight relation valued attributes: bit0, bit1, bit2,
bit3, bit4, bit5, bit6, and bit7. The relation type (header) of each of these
relation-valued attributes is DEEDUM. Thus, each attribute can have the
relation value TABLE_DEE or TABLE_DUM. I can then say there are 256 possible
tuples which might be in the body of this relation. If I were to define a
relvar of type BYTE with the constraint that its cardinality must be exactly
one, I have the equivalent of a variable that holds 8 bit bytes. I can define
several operations which apply to cardinality one BYTE relation values
(addition and subtraction, for instance). Clearly, I need not consider the
possible tuples of this relation type as numbers; I could call them characters
instead and define some other operators appropriate to that interpretation.
5.
I can't figure out how to define a relvar in this manner that
could contain an arbitrarily large integer (or an arbitrarily long character
string). I can define a multiplication operation which would take two
cardinality one BYTE relation values and return a single cardinality one WORD
relation value. (Relation type WORD would have 16 attributes, bit0 through
bit15, that are relation values of type DEEDUM). But, because attributes are
unordered in the header, I have to name the attributes rather than specifying
them in some expandable way (such as by position). I can't create some generic
operator that is able to multiply any two cardinality one relation values that
have headers consisting only of DEEDUM valued attributes because I can't be
sure which attribute name would represent the most significant bit. My current
naming scheme is good for clarity, but a generic operator would have to be able
to handle a relation header with names like Red, Green and Blue equally as well
as bit2, bit1 and bit0.
6.
This leads me to believe that machine representations of
domains must always be finite. In your book TEMPORAL DATA AND THE
RELATIONAL MODEL, you state that any interval domain must be defined on
a scale which has fixed end points. You make logical arguments about how a time
interval must be defined on some range of dates with a definite beginning and
ending date.
7.
Of course, physical representations will be finite. I'm
trying to avoid physical issues because I'm considering the Relation Model of
Data, which exists purely on the logical level. My objective is to show that
using the Relational Model of Data, all data can be represented just by
relation values, and that the attributes of relations may be restricted to
being only other relation values. My problem is that in the realm of
mathematics, we are able to consider infinite domains, but I have not figured
out how to logically represent an infinite domain using only relation values.
8.
What you think about finite versus infinite domain
definitions?
Hugh Darwen Responds:
1. Do we
really say that CHAR should *not* be a type generator? I don't think I
would advocate it, but perhaps it's doable. And the parameter could just
as well be for a maximum length as for a "fixed" one. Another
approach would be to regard CHAR(23), for example, as a subtype of CHAR, where
CHAR is all character strings and CHAR(23) is strings no longer than 23, or
possibly strings of length 23. Anyway, how character types are (or the
character type is) defined is not an issue in
THE THIRD MANIFESTO
. I suspect the reference is really to Tutorial D, which is merely a notation we
use for illustrative purposes.
We do not consider infinite types
at all, nor do we consider infinite relations. Some people think that
infinite types should be supported, but we have not seen any rigorous treatment
of that concept as an extension to Codd's Relational Model, nor have we been
motivated to explore the idea ourselves. At the very least there would be
an implication for the definition of the aggregate operators MAX and MIN on the
empty set, but I suspect CJD would point out several other difficulties.
2. No, I think
TRUE and FALSE come first. TABLE_DEE and TABLE_DUM are relations, not
truth-values. A relation represents an extension of a predicate.
TABLE_DEE and TABLE_DUM are the only possible extensions of a predicate in
which there are no free variables. "The King of France is bald"
is one such predicate. If, when considered as a proposition, it is true,
then the corresponding relation is TABLE_DEE; otherwise, it is TABLE_DUM.
It seems I need the concept of the truth of a proposition before I can
encounter the TABLE_ twins. Another such predicate is "There exists
a supplier S# and there exists a part P# such that S# supplies P#".
The corresponding relation would be given by SP{ } (the projection of SP over
no attributes). SP{ } evaluates to either TABLE_DEE or TABLE_DUM,
according to the existence or not of at least one tuple in SP.
3. It was I
who introduced TABLE_DEE and TABLE_DUM to the database community, in the
article that became the chapter of that name in RELATIONAL DATABASE
WRITINGS, 1985-1989, so I can answer this one with some
authority! TABLE_DEE has one tuple, no attributes; TABLE_DUM is even more
impoverished than TABLE_DEE.
4. Surely not
"the equivalent"? I mean, the assignment BYTE := B'10101010'
and the comparison BYTE = '10101010' are surely illegal? They would be in
a D.
You seem to condone overloading
arithmetic operators to apply to bit strings as well as numbers. I
wouldn't be in favour of that, personally, though we cannot legislate against
overloading.
5. I think
this is based on assumptions I have already disputed, so I thereby (at least)
dispute it.
6. Yes, [finite]
and therefore "domains" are. (I'd love to know why you reject
our term "type".)
Yes, except for interval types
defined on cyclic point types, which have a single "origin" point
(such as Monday, or midnight) rather than two end points.
7. We require
every value to be physically representable. If the number of physical
representations is finite, then so must be the type.
8. I answered
this one earlier.
Posted 03/26/04