ON WHAT IS A RELATION
with Chris Date

 

 

 

From: DW
To: Editor

 

I am surprised by the implications of Chris Date's rather radical departure from the usual definition of a relation (as described in his INTRODUCTION TO DATABASE SYSTEMS, 7th ed. sec. 5.3 p. 123).

 

The usual understanding of the relational data model (and the one physically represented by tables) goes something like this: A proposition of an n-ary relation in the predicate calculus is an ordered n-tuple. A straightforward mapping into the relational model would likewise identify attributes by order. A relation would consist of a heading and a body, where the heading is an ordered n-tuple of domains and the body is a set of propositions (each an ordered n-tuple of values). Attributes are identified by order in the relation (or table), and attribute names are not needed, although for convenience they could be provided as aliases. The version of relational data model just described is (I believe) the model most widely (mis)understood by database administrators.

 

Chris Date takes a different tack in defining the relational model, identifying attributes by (name,domain) instead of by position. Instead of ordered n-tuples, he proposes a set of ordered pairs (name,domain). He leaves two important ramifications unstated:

 

1.      It follows that the heading thus defined is a binary relation mapping an attribute name to either (1) a domain (the two domains of this binary relation being character string and set of all domains), or (2) a domain name (where both domains are character strings, with a further mapping of domain name to domain implied elsewhere). As such, the definition of a relation is itself a perfectly ordinary relation that can be stored in the database and exploited. It should be possible to exploit this symmetry of representation, e.g. to define a table replacing domain names with SELECT expressions:

 

Given the following relation R:

 

attname | attdomain
-----------------------
a1      | integer
a2      | char(2)

 

that defines the relation:

 

CREATE TABLE first (a1 INTEGER,a2 CHAR(2));

 

we should instead be able to say something like (using pseudo-SQL)

 

CREATE TABLE first

 ((SELECT attname,attdomain 
   FROM r));

 

Or if we want to say

 

CREATE TABLE second (b1 INTEGER,b2 DATE b3 CHAR(2));

 

we should be able to say in its place:

 

CREATE TABLE second

 (b2 DATE, b1

  (SELECT attdomain 
  FROM r 
  WHERE attname = a1),
  b3 (SELECT attdomain
      FROM r 
      WHERE attname = a2));

 

Obviously SQL is the wrong language to exploit this equivalence of data and metadata.

 

2.      It also follows that since his heading is a set of ordered pairs (not attribute names), only the ordered pair need be unique (composite primary key of attribute name and domain), so that more than one attribute may share a name in one relation if they have different domains. Chris Date seems to arbitrarily require unique attribute names. Are there profound advantages in the relational closure implicit in his definition that we can use (e.g. data-driven definition of relations)? This seems to open up a lot of possibilities in the relational model that I have never seen before in the literature. Chris Date must have had this in mind to justify the departure from the usual analogy to predicate calculus in using a set of ordered pairs instead of an ordered n-tuple to define a relation, a construct that does not exist in any physical or logical model I'm aware of.

 

If you think we should all change our understanding of the relational data model, I would love to see you or Chris pursue the matter.
 

 

Chris Date Responds:  Well, Dan Weston's comments and questions are certainly in a different league from most of the ones I get; in fact, they're rather more interesting than most (and thank you, Dan, for that). Sadly, however, I do find a certain amount of confusion in them. Let's see if clarification is possible.

 

First of all, Weston says he's "surprised by... [my] rather radical definition of a relation." Frankly, I'm surprised by his surprise. My definition follows the spirit if not the letter of Codd's definition as given in his 1970 paper in Communications of the ACM--the paper that's usually credited with getting the whole relational effort rolling, of course, though that credit should strictly speaking go to that paper's 1969 predecessor.

 

Weston says the "usual understanding of the relational data model" involves a left-to-right ordering to the attributes of a relation. Now, I don't know what Weston means by that "usual understanding," but I'm here to say that that "usual understanding" is dead wrong (in other words, it's not an "understanding" at all!). He criticizes the 7th edition of my book AN INTRODUCTION TO DATABASE SYSTEMS for saying there isn't any such left-to-right ordering; well, all I can say is that every edition of that book (all the way back to the first edition, which appeared in 1975) has said the exact same thing. (I'm certainly guilty of changing my mind over the years on a lot of things, but not on this one. I change my mind when there's some rational justification for doing so. I try not to do so otherwise.)

 

Weston also refers to "the [understanding] of the relational data model physically represented by tables." Now, I'm not really sure what he means here, but of course a large part of the point about the relational model is precisely that it has absolutely nothing to say about physical representations. Arguing from the physical to the logical in the context at hand is 100% backward.

 

Now, Weston also says I leave "two important ramifications" of my "radical departure from the usual definition of a relation... unstated." The first of those "ramifications" is essentially that the heading of a relation is itself a relation. Well, yes, of course it is; that's one reason why the catalog in a relational system is relational. Metadata is data: Right? 

 

Note: The fact that metadata is just data is so obvious now--I mean, it's been obvious ever since the relational model came along--that it's probably hard for most modern database practitioners to understand that things were ever otherwise. But they were (e.g., IMS). In fact, when the relational model did first come along, I almost wrote a paper pointing out this very fact (that metadata was data), but decided not to because the point seemed so obvious. But I've had many occasions since then to wish I'd written that paper after all; it would have been useful to have been able to reference it in many discussions since that time (the present one included).

 

By the way, Weston's pseudoSQL example "CREATE TABLE second..." with nested SELECT expressions doesn't do what he wants it to do. Those nested SELECT expressions return relations, so he's apparently defining B1 and B3 as relation-valued attributes. Not that there's necessarily anything wrong with that, I hasten to add, but it's clear from his previous definition of "second" that it's not what he meant.

 

The "second unstated important ramification" is that it might be possible (under "my" definition) for two attributes of the same relation to have the same name, so long as the attributes in question are of different types (= are defined on different domains). Pace Weston's assertion here, my INTRODUCTION book explicitly states on page 123 that such is not the case (i.e., two attributes of the same relation can’t have the same name). Now, presumably Weston noticed that statement on my part, because he goes on to say that "Date seems to arbitrarily require unique attribute names." I do object most vehemently to that "arbitrarily"! Let me assure Weston that there was nothing arbitrary about the decision. The justification for requiring unique attribute names has to do with the way--I want to say, the extremely elegant way -- we define the operators of the relational algebra, including in particular (as part of that definition) the definition of the all-important relation type inference rules. I don't want to spell out those definitions and rules in detail here (they're in the book, and they're part of the reason I wrote the book in the first place--and they're part of the "profound advantages in the relational closure implicit in [my] definition," to quote Weston once again). Oh: When I say "we" above, I mean, primarily, Hugh Darwen and myself--see our book FOUNDATION FOR FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO.

 

Weston says I must have had something in mind (something he elaborates on, but I don't fully understand his elaboration) "to justify [my] departure from the usual analogy to predicate calculus ... to define a relation." But I do appeal, very strongly, to that analogy, all through the book already mentioned a couple of times above; indeed, it's a very strong feature of the relational model that the analogy exists (see my recent six-part article Constraints and Predicates: A Brief Tutorial).

 

Weston also accuses me of espousing "a construct that does not exist in any physical or logical model that [he's] aware of." But the construct in question (= no left-to-right attribute ordering in a relation) categorically exists in the relational model, which is a logical model, of course. (I'm not interested in this context in any so-called "physical model," and of course I have no idea as to what "physical or logical" models Weston himself might be aware of. What I am interested in is the relational model.) 

Finally, Weston suggests that I "think we should all change our understanding of the relational model." Well, maybe. What it means to change one's understanding depends in part on what one's understanding currently is. Certainly I think the database community should try to understand the relational model for what it actually is, and not for what it isn't -- and my own experience in trying to teach this material tells me that that would indeed be a change for a lot of people.

 

PS: Of course, SQL is the prize example of what the relational model isn’t. In particular, SQL does think there's a left-to-right order to the attributes of a relation. But then again, SQL doesn't really support relations anyway. 
 
 

Editor Comments: A good indicator of the sad state of knowledge in the database field is that even those "whose heart is in the right place" (as Chris says) do not have a good grasp of the fundamentals. So what can we expect of the remaining vast majority? Here’s one of our recent weekly quotes, by two leading Oracle practitioners, authors and instructors:

 

"Even though the tables are independent, you can easily see that they are related. The city name in one table is related to the city name in the other. This relationship is the basis for the name relational database." --G. Koch and K. Loney, ORACLE: THE COMPLETE REFERENCE

 

 

Posted: 06/28/02

 

 

 

[ABOUT] [QUOTES] [LINKS]