From: DW
To: Editor
I am surprised by the implications of Chris Date's rather
radical departure from the usual definition of a relation (as described in his INTRODUCTION
TO DATABASE SYSTEMS, 7th ed. sec. 5.3 p. 123).
The usual understanding of the relational data model (and the
one physically represented by tables) goes something like this: A proposition
of an n-ary relation in the predicate calculus is an ordered n-tuple. A
straightforward mapping into the relational model would likewise identify
attributes by order. A relation would consist of a heading and a body, where
the heading is an ordered n-tuple of domains and the body is a set of
propositions (each an ordered n-tuple of values). Attributes are identified by
order in the relation (or table), and attribute names are not needed, although
for convenience they could be provided as aliases. The version of relational
data model just described is (I believe) the model most widely (mis)understood
by database administrators.
Chris Date takes a different tack in defining the relational
model, identifying attributes by (name,domain) instead of by position. Instead
of ordered n-tuples, he proposes a set of ordered pairs (name,domain). He
leaves two important ramifications unstated:
1.
It follows that the heading thus defined is a binary
relation mapping an attribute name to either (1) a domain (the two domains
of this binary relation being character string and set of all domains), or (2)
a domain name (where both domains are character strings, with a further mapping
of domain name to domain implied elsewhere). As such, the definition of a
relation is itself a perfectly ordinary relation that can be stored in the
database and exploited. It should be possible to exploit this symmetry of
representation, e.g. to define a table replacing domain names with SELECT
expressions:
Given the following relation R:
attname | attdomain
-----------------------
a1 | integer
a2 | char(2)
that defines the relation:
CREATE TABLE first (a1 INTEGER,a2 CHAR(2));
we should instead be able to say
something like (using pseudo-SQL)
CREATE TABLE first
((SELECT
attname,attdomain
FROM r));
Or if we want to say
CREATE TABLE second (b1 INTEGER,b2 DATE b3 CHAR(2));
we should be able to say in its
place:
CREATE TABLE second
(b2 DATE, b1
(SELECT attdomain
FROM r
WHERE attname = a1),
b3
(SELECT attdomain
FROM r
WHERE attname = a2));
Obviously SQL is the wrong language
to exploit this equivalence of data and metadata.
2.
It also follows that since his heading is a set of ordered
pairs (not attribute names), only the ordered pair need be unique (composite
primary key of attribute name and domain), so that more than one attribute may
share a name in one relation if they have different domains. Chris Date seems
to arbitrarily require unique attribute names. Are there profound advantages in
the relational closure implicit in his definition that we can use (e.g.
data-driven definition of relations)? This seems to open up a lot of
possibilities in the relational model that I have never seen before in the
literature. Chris Date must have had this in mind to justify the departure from
the usual analogy to predicate calculus in using a set of ordered pairs instead
of an ordered n-tuple to define a relation, a construct that does not exist in
any physical or logical model I'm aware of.
If you think we should all change our understanding of the
relational data model, I would love to see you or Chris pursue the matter.
Chris Date Responds: Well, Dan Weston's comments
and questions are certainly in a different league from most of the ones I get;
in fact, they're rather more interesting than most (and thank you, Dan, for
that). Sadly, however, I do find a certain amount of confusion in them. Let's
see if clarification is possible.
First of all, Weston says he's "surprised by... [my]
rather radical definition of a relation." Frankly, I'm surprised by his
surprise. My definition follows the spirit if not the letter of Codd's
definition as given in his 1970 paper in Communications of the ACM--the
paper that's usually credited with getting the whole relational effort rolling,
of course, though that credit should strictly speaking go to that paper's 1969
predecessor.
Weston says the "usual understanding of the relational
data model" involves a left-to-right ordering to the attributes of a
relation. Now, I don't know what Weston means by that "usual
understanding," but I'm here to say that that "usual
understanding" is dead wrong (in other words, it's not an
"understanding" at all!). He criticizes the 7th edition of my book AN INTRODUCTION TO
DATABASE SYSTEMS for saying there isn't any such left-to-right
ordering; well, all I can say is that every edition of that book (all the way
back to the first edition, which appeared in 1975) has said the exact same
thing. (I'm certainly guilty of changing my mind over the years on a lot of
things, but not on this one. I change my mind when there's some rational
justification for doing so. I try not to do so otherwise.)
Weston also refers to "the [understanding] of the
relational data model physically represented by tables." Now, I'm not
really sure what he means here, but of course a large part of the point about
the relational model is precisely that it has absolutely nothing to say
about physical representations. Arguing from the physical to the logical in
the context at hand is 100% backward.
Now, Weston also says I leave "two important
ramifications" of my "radical departure from the usual definition of
a relation... unstated." The first of those "ramifications" is
essentially that the heading of a relation is itself a relation. Well, yes, of
course it is; that's one reason why the catalog in a relational system is
relational. Metadata is data: Right?
Note: The fact that
metadata is just data is so obvious now--I mean, it's been obvious ever since
the relational model came along--that it's probably hard for most modern
database practitioners to understand that things were ever otherwise. But they
were (e.g., IMS). In fact, when the relational model did first come along, I
almost wrote a paper pointing out this very fact (that metadata was data), but
decided not to because the point seemed so obvious. But I've had many occasions
since then to wish I'd written that paper after all; it would have been useful
to have been able to reference it in many discussions since that time (the
present one included).
By the way, Weston's pseudoSQL example "CREATE TABLE
second..." with nested SELECT expressions doesn't do what he wants it to
do. Those nested SELECT expressions return relations, so he's apparently
defining B1 and B3 as relation-valued attributes. Not that there's necessarily
anything wrong with that, I hasten to add, but it's clear from his previous
definition of "second" that it's not what he meant.
The "second unstated important ramification" is
that it might be possible (under "my" definition) for two attributes
of the same relation to have the same name, so long as the attributes in
question are of different types (= are defined on different domains). Pace
Weston's assertion here, my INTRODUCTION book explicitly states on page
123 that such is not the case (i.e., two attributes of the same relation
can’t have the same name). Now, presumably Weston noticed that statement
on my part, because he goes on to say that "Date seems to arbitrarily
require unique attribute names." I do object most vehemently to that
"arbitrarily"! Let me assure Weston that there was nothing arbitrary
about the decision. The justification for requiring unique attribute names has
to do with the way--I want to say, the extremely elegant way -- we
define the operators of the relational algebra, including in particular (as
part of that definition) the definition of the all-important relation type
inference rules. I don't want to spell out those definitions and rules in
detail here (they're in the book, and they're part of the reason I wrote the
book in the first place--and they're part of the "profound advantages in
the relational closure implicit in [my] definition," to quote Weston once
again). Oh: When I say "we" above, I mean, primarily, Hugh Darwen and
myself--see our book FOUNDATION FOR
FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO.
Weston says I must have had something in mind (something he
elaborates on, but I don't fully understand his elaboration) "to justify
[my] departure from the usual analogy to predicate calculus ... to define a
relation." But I do appeal, very strongly, to that analogy, all through
the book already mentioned a couple of times above; indeed, it's a very strong
feature of the relational model that the analogy exists (see my recent six-part
article Constraints
and Predicates: A Brief Tutorial).
Weston also accuses me of espousing "a construct that
does not exist in any physical or logical model that [he's] aware of." But
the construct in question (= no left-to-right attribute ordering in a relation)
categorically exists in the relational model, which is a logical
model, of course. (I'm not interested in this context in any so-called
"physical model," and of course I have no idea as to what
"physical or logical" models Weston himself might be aware of. What I
am interested in is the relational model.)
Finally, Weston suggests that I "think we should all
change our understanding of the relational model." Well, maybe. What it
means to change one's understanding depends in part on what one's
understanding currently is. Certainly I think the database community
should try to understand the relational model for what it actually is, and not
for what it isn't -- and my own experience in trying to teach this material
tells me that that would indeed be a change for a lot of people.
PS: Of course, SQL is the prize example of what the
relational model isn’t. In particular, SQL does think there's a left-to-right
order to the attributes of a relation. But then again, SQL doesn't really
support relations anyway.
Editor Comments: A
good indicator of the sad state of knowledge in the database field is that even
those "whose heart is in the right place" (as Chris says) do not have
a good grasp of the fundamentals. So what can we expect of the remaining vast
majority? Here’s one of our recent weekly quotes, by two leading Oracle
practitioners, authors and instructors:
"Even though the tables are independent, you can easily see
that they are related. The city name in one table is related to the city name
in the other. This relationship is the basis for the name relational
database." --G. Koch and K. Loney, ORACLE: THE COMPLETE REFERENCE
Posted:
06/28/02
[ABOUT]
[QUOTES]
[LINKS]