ON TEMPORAL DATA AND THE RELATIONAL MODEL
with C. J. Date and H. Darwen

 

 

 

From: AB

To: Editor

Date: 17 Jan 2005

 

I'm reading TEMPORAL DATA AND THE RELATIONAL MODEL (I'm only as far as Chapter 5 now). At the end of section 1.4 (Relation Values) there's a section Relations and their meaning, where the idea of a relation

predicate is presented. In this section, an example of a projection is given:

 

S { S#, SNAME, STATUS } i.e. S { ALL BUT CITY }

 

You claim the predicate statement for the resulting relation is:

 

Supplier S# is under contract, is named SNAME,  has status STATUS, and is located in some city.

 

I don't see how it is correct to make any mention of the city in the predicate statement, as it is not part of the resulting relation. Indeed, if we may mention things we are not in the relation, why not add arbitrary

statements, such as: "... and is located at some address, and has some person as the CEO, and employs some number of staff, ..."

 

Imagine we had a suppliers relvar, S2, which is the same as S but with no CITY attribute e.g.

 

S#  SNAME  STATUS

=================

S1  Smith  20

S2  Jones  10

S3  Blake  30

S4  Clark  20

S5  Adams  30

 

This is the same as the result of the projection S { S#, SNAME, STATUS }. Yet I believe that the correct relation predicate for this relation is:

 

Supplier S# is under contract, is named SNAME, and has status STATUS.

 

We can imagine that there are many other facts about the supplier, but as they are not captured in this relation, we do not mention them in the predicate. In other examples in the same section, facts about the supplier that are not part of the relation are not mentioned in the relation predicate. The relation predicate only has something to say about the headings in the relation, and nothing else.

 

Do you mention the city only because the projection is derived from a relvar which contains a CITY attribute? I don't see any point in this (what information do you have by knowing that the supplier is in _some_ city?). Also, that seems to be inconsistent with the closure property of the relational operators i.e. that the result of a relational operator is another relation. I view this as treating "derived" relations the same as the base relation i.e. you shouldn't be able to tell the difference between them. In your example, it looks as thought the meanings of two apparently identical relations differs if one of them is a "base" relation and the other is a "derived" relation.

 

 

Hugh Darwen Responds: AB's predicate is not incorrect but it is incomplete because it loses the meaning of the projection operation applied to S.  Projection is the counterpart in relational algebra of the logical operator known as existential quantification.  Hence, "and is located in *some* city" (my emphasis).  It would not be correct to add "and has some person as CEO", because S does not have any attribute that, having been discarded under the projection, would allow us to reach such a conclusion.

 

I remark that other operators of the relational algebra also have counterparts in logic.  For example, JOIN means "and", so a correct predicate for S JOIN SP (for example) can be formed simply by inserting the word "and" between the predicates for S and SP.  Similarly, UNION means "or", and MINUS means "and not".

 

 

C. J. Date responds: Your questions are important—in some ways, in fact, they get to the heart of what databases are all about—so I'll do my best to answer them as carefully as I can.  That said, I think I need to say too that there are some tricky issues and subtleties involved.  You might need to read what follows more than once in order to absorb it fully (if so, I apologize, but I believe it's in the nature of the subject). 

 

For simplicity, I'll assume throughout that all relvars are base or real ones specifically (no virtual relvars or views). 

 

Essentially, your questions all have to do with what the symbols or combinations of symbols that happen to appear in some database mean.  Now, I hope you agree that the kinds of symbols we're talking about don't, in general, have any absolute or fixed meaning.  Even a familiar symbol like "3" doesn't mean much in the absence of appropriate additional information; three “whats”?  Of course, if I'm a DBA, I might design a certain database in such a way that the appearance of that symbol "3" in a certain position is to be interpreted to mean that (for example) employee Joe has three weeks of vacation.  And then, if you're a user of my database, I would have to explain that interpretation to you in order for you to be able to understand and use that piece of data correctly and effectively.  Naturally (?) I'd try to choose relvar names, attribute names, and so forth that make that explanation as intuitively obvious as possible.  For example, I might use names as indicated in the following picture: 

 

VACATIONS

+--------------------+

¦ NAME ¦ NO_OF_WEEKS ¦

+------+-------------¦

¦ Joe  ¦           3 ¦

¦ ...  ¦          .. ¦

 

But the fact remains that I would still have to explain the interpretation to you, even when it's "intuitively obvious" as in this example.  (In any case, I'm sure we've all seen real-world relvars where the relvar and attribute names make the intended interpretation very far from obvious indeed.) 

 

Now, the way to explain the interpretation of a given relvar—equivalently, the way to explain what that relvar means—is by stating the corresponding predicate.  In the case at hand, the predicate looks something like this: 

 

The employee called NAME has NO_OF_WEEKS weeks of vacation. 

 

It's important to understand that every relvar has a corresponding predicate; the predicate is the intended interpretation.  It's also important to understand that users must always be told the predicate for every relvar they want to use.  (In practice, we don't often talk in such high-flown terms, of course—that is, we don't often use the term predicate in the sense in which I'm using it here—but we do have to say what relvars mean; and however we choose to carry out that necessary task, whatever we do is logically equivalent to stating the predicate.) 

 

Note very carefully too that the predicate for a given relvar is not innate (if it were, we wouldn't have to spell it out).  Even the VACATIONS example might conceivably have a very different interpretation—for example: 

 

The dog called NAME has had NO_OF_WEEKS visits to the vet. 

 

Of course, if this latter is the intended interpretation, the names VACATIONS and NO_OF_WEEKS aren't very well chosen from an intuitive point of view, but so what?  There's nothing logically wrong with them.  Names are arbitrary—right? 

 

Next, given some relvar, each tuple appearing in that relvar at some given time t represents some proposition that evaluates to true at that time t (more accurately, some proposition that we believe evaluates to true at that time t, but I'm going to ignore this latter nicety here).  The proposition in question is derived from the relvar predicate by substituting attribute values from the tuple in question for the parameters appearing in the predicate.  In the case of relvar VACATIONS, for example, with its predicate

 

The employee called NAME has NO_OF_WEEKS weeks of vacation

 

the tuple for Joe represents the proposition

 

The employee called Joe has 3 weeks of vacation

 

(and that proposition is supposed to evaluate to true). 

 

I turn now to the suppliers-and-shipments database that your questions are based on.  Suppose I'm the DBA and you're the user.  So I tell you—and you have no choice but to believe me!—that the predicate for relvar S (suppliers) is: 

 

Supplier S# is under contract, is named SNAME, has status STATUS, and is located in city CITY. 

 

And the predicate for relvar SP (shipments) is: 

 

Supplier S# is able to supply part P#. 

 

Armed with this information, you now know that, given any particular "database state"—i.e., the overall value of the database at any particular time—the tuples in the relation that's the current value of relvar S represent true propositions of a certain form (and likewise for relvar SP).  And you rely on that knowledge in a variety of ways—in particular, when you do queries.  As a simple example, suppose you form the join of suppliers and shipments: 

 

S JOIN SP

 

Then you know that every tuple in the relation that's the output from that join represents a true proposition of the following form: 

 

Supplier S# is under contract, is named SNAME, has status STATUS, is located in city CITY, and is able to supply part P#. 

 

In other words, this latter statement is (by definition) the predicate for the output from the join, and it effectively dictates how you interpret the result of the query.  In fact, it's precisely because you understand that the result of the join is to be interpreted in this way that you formulated the query the way you did.  The natural-language formulation of the query is "Get supplier number, name, status, city, and part number for each supplier under contract and each part supplied by the supplier in question." 

 

By way of another example, suppose you restrict suppliers to just the ones in Paris ("Get supplier number, name, status, and city for suppliers under contract and located in Paris"): 

 

S WHERE CITY = 'Paris'

 

The predicate for the output is: 

 

Supplier S# is under contract, is named SNAME, has status STATUS, is located in city CITY, and the CITY is Paris. 

 

Of course, there's little point in retaining the CITY attribute in the output in this example, because we know its value is Paris in every tuple.  So a more reasonable form of the query is the following, which projects away the CITY attribute ("Get supplier number, name, and status—but not city—for suppliers under contract and located in Paris"): 

 

( S WHERE CITY = 'Paris' ) { ALL BUT CITY } 

 

And the predicate for the output is, obviously enough: 

 

Supplier S# is under contract, is named SNAME, has status STATUS, and is located in Paris. 

 

What I'm trying to show by these examples is that for every relational operation (join, restrict, project, and so on), given the predicate(s) satisfied by the tuples of the input relation(s), there's a well-defined predicate that's satisfied by the tuples of the output relation—and that "output predicate" is determined from the "input predicate(s)" by the semantics of the operation in question, as the examples illustrate. 

 

Now, the particular case you ask about is the projection of suppliers on all but CITY ("Get supplier number, name, and status for suppliers under contract"): 

 

S { ALL BUT CITY }

 

Adopting an obvious simplified notation for tuples, it's clear that the tuple <s#,sn,st> appears in the output here if and only if the tuple <s#,sn,st,sc> appears in the input for some city sc.  So the output predicate is: 

 

Supplier S# is under contract, is named SNAME, has status STATUS, and is located in some (unspecified) city sc. 

 

Now, you object that you "don't see how it is correct to make any mention of the city in [this predicate], as it is not part of the resulting relation."  You further ask why we don't include further arbitrary terms such as "... and is located at some address, and has some person as the CEO, and employs some number of staff" (and so on).  However, later you say:  "Do [we] mention the city only because the projection is derived from a [relation that] contains a CITY attribute?"  I hope you now see that the answer to this latter question is yesThe output predicate depends on the input predicate (or input predicates, plural, but there's only one in the example).  If the input predicate had mentioned the CEO (for example), the output predicate would have had to do so too. 

 

You also say:  "Imagine we had a suppliers relvar [that] is the same as [relvar] S but with no CITY attribute ... This is the same as the result of the projection of S on all but CITY, yet I believe the correct ... predicate is:  Supplier S# is under contract, is named SNAME, and has status STATUS" (forgive my minor paraphrasing of your original text).  I'm afraid you're under a serious misconception here.  This latter suppliers relvar is NOT "the same as the projection of relvar S on all but CITY."  I agree it has the same attributes; it might even have the same tuples (at the time under discussion, possibly even at all times); but it does NOT follow that the two are the same.  It depends on the predicates; in fact, they would be the same if and only if they had the same predicates.  But we've already agreed (I hope!—see the VACATIONS example earlier) that the predicate for a given relvar is whatever the definer says it is—it is not innate.  Thus, you're at liberty to say the predicate for your revised suppliers relvar is what you say it is ("Supplier S# is under contract, is named SNAME, and has status STATUS"); but even if you do, there's no contradiction or inconsistency with my claim to the effect that the predicate for the projection of my suppliers relvar on all but CITY is what I say it is ("Supplier S# is under contract, is named SNAME, has status STATUS, and is located somewhere"). 

 

Of course, you're also at liberty to say the predicate for your revised suppliers relvar is something entirely different—perhaps "Supplier S# is a friend of mine, has a cat called SNAME, and lives STATUS miles outside town."  Still no contradiction.  (Though it's probably bad design—but this latter objection is likely to be the case no matter what the intended interpretation is for your revised suppliers relvar.*  However, that's a topic for another day.) 

 

----------

*  I'm assuming here that we're talking about a design that includes both your revised suppliers relvar and my original one.  If it includes just one of the two, there's no problem. 

----------

 

I very much hope the foregoing discussion clarifies matters; but if not, then I encourage you to keep asking questions.  As I said at the outset, the issue is important, and it's worth the effort on both our parts to get things straight.  Thanks again for your original questions. 

 

PS:  You say "In other examples in the same section [i.e., Section 1.4 of our book TEMPORAL DATA AND THE RELATIONAL MODEL], facts about the supplier that are not part of the relation are not mentioned in the relation predicate."  I can't see where we do this, and I hope we don't.  Can you be more specific?

 

 

Posted 3/25/05