From: AB
To: Editor
Date: 17 Jan 2005
I'm reading TEMPORAL DATA AND THE RELATIONAL MODEL
(I'm only as far as Chapter 5 now). At the end of section 1.4 (Relation Values)
there's a section Relations and their meaning, where the idea of a
relation
predicate is presented. In this section, an example of a
projection is given:
S { S#, SNAME, STATUS } i.e. S { ALL BUT CITY }
You claim the predicate statement for the resulting relation
is:
Supplier S# is
under contract, is named SNAME, has
status STATUS, and is located in some city.
I don't see how it is correct to make any mention of the city
in the predicate statement, as it is not part of the resulting relation.
Indeed, if we may mention things we are not in the relation, why not add
arbitrary
statements, such as: "... and is located at some
address, and has some person as the CEO, and employs some number of staff,
..."
Imagine we had a suppliers relvar, S2, which is the same as S
but with no CITY attribute e.g.
S# SNAME STATUS
=================
S1 Smith 20
S2 Jones 10
S3 Blake 30
S4 Clark 20
S5 Adams 30
This is the same as the result of the projection S { S#,
SNAME, STATUS }. Yet I believe that the correct relation predicate for this
relation is:
Supplier S# is
under contract, is named SNAME, and has status STATUS.
We can imagine that there are many other facts about the
supplier, but as they are not captured in this relation, we do not mention them
in the predicate. In other examples in the same section, facts about the
supplier that are not part of the relation are not mentioned in the relation
predicate. The relation predicate only has something to say about the headings
in the relation, and nothing else.
Do you mention the city only because the projection is
derived from a relvar which contains a CITY attribute? I don't see any point in
this (what information do you have by knowing that the supplier is in _some_
city?). Also, that seems to be inconsistent with the closure property of the
relational operators i.e. that the result of a relational operator is another
relation. I view this as treating "derived" relations the same as the
base relation i.e. you shouldn't be able to tell the difference between them.
In your example, it looks as thought the meanings of two apparently identical
relations differs if one of them is a "base" relation and the other
is a "derived" relation.
Hugh Darwen Responds: AB's predicate is not incorrect
but it is incomplete because it loses the meaning of the projection operation
applied to S. Projection is the
counterpart in relational algebra of the logical operator known as existential
quantification. Hence, "and is
located in *some* city" (my emphasis).
It would not be correct to add "and has some person as CEO",
because S does not have any attribute that, having been discarded under the
projection, would allow us to reach such a conclusion.
I remark that other operators of the relational algebra also
have counterparts in logic. For
example, JOIN means "and", so a correct predicate for S JOIN SP (for
example) can be formed simply by inserting the word "and" between the
predicates for S and SP. Similarly,
UNION means "or", and MINUS means "and not".
C. J. Date responds: Your questions are important—in
some ways, in fact, they get to the heart of what databases are all about—so
I'll do my best to answer them as carefully as I can. That said, I think I need to say too that there are some tricky issues and subtleties involved. You
might need to read what follows more than once in order to absorb it fully (if
so, I apologize, but I believe it's in the nature of the subject).
For simplicity, I'll assume throughout that all relvars are
base or real ones specifically (no virtual relvars or views).
Essentially, your questions all have to do with what the
symbols or combinations of symbols that happen to appear in some database mean. Now, I hope you agree that the kinds of
symbols we're talking about don't, in general, have any absolute or fixed
meaning. Even a familiar symbol like
"3" doesn't mean much in the absence of appropriate additional
information; three “whats”? Of course,
if I'm a DBA, I might design a certain database in such a way that the
appearance of that symbol "3" in a certain position is to be
interpreted to mean that (for example) employee Joe has three weeks of
vacation. And then, if you're a user of
my database, I would have to explain that interpretation to you in order for
you to be able to understand and use that piece of data correctly and
effectively. Naturally (?) I'd try to
choose relvar names, attribute names, and so forth that make that explanation
as intuitively obvious as possible. For
example, I might use names as indicated in the following picture:
VACATIONS
+--------------------+
¦ NAME ¦ NO_OF_WEEKS ¦
+------+-------------¦
¦ Joe
¦ 3 ¦
¦ ...
¦ .. ¦
But the fact remains that I would still have to explain the
interpretation to you, even when it's "intuitively obvious" as in
this example. (In any case, I'm sure
we've all seen real-world relvars where the relvar and attribute names make the
intended interpretation very far from obvious indeed.)
Now, the way to explain the interpretation of a given
relvar—equivalently, the way to explain what that relvar means—is by stating
the corresponding predicate. In
the case at hand, the predicate looks something like this:
The employee called
NAME has NO_OF_WEEKS weeks of vacation.
It's important to understand that every relvar has a
corresponding predicate; the predicate is the intended
interpretation. It's also important to
understand that users must always be told the predicate for every relvar
they want to use. (In practice, we
don't often talk in such high-flown terms, of course—that is, we don't often
use the term predicate in the sense in which I'm using it here—but we do
have to say what relvars mean; and however we choose to carry out that
necessary task, whatever we do is logically equivalent to stating the predicate.)
Note very carefully too that the predicate for a given relvar
is not innate (if it were, we wouldn't have to spell it out). Even the VACATIONS example might conceivably
have a very different interpretation—for example:
The dog called NAME
has had NO_OF_WEEKS visits to the vet.
Of course, if this latter is the intended interpretation, the
names VACATIONS and NO_OF_WEEKS aren't very well chosen from an intuitive point
of view, but so what? There's nothing logically
wrong with them. Names are arbitrary—right?
Next, given some relvar, each tuple appearing in that relvar
at some given time t represents some proposition that evaluates to true
at that time t (more accurately, some proposition that we believe
evaluates to true at that time t, but I'm going to ignore this latter
nicety here). The proposition in
question is derived from the relvar predicate by substituting attribute values
from the tuple in question for the parameters appearing in the predicate. In the case of relvar VACATIONS, for example,
with its predicate
The employee called
NAME has NO_OF_WEEKS weeks of vacation
the tuple for Joe represents the proposition
The employee called Joe has 3 weeks of vacation
(and that proposition is supposed to evaluate to true).
I turn now to the suppliers-and-shipments database that your
questions are based on. Suppose I'm the
DBA and you're the user. So I tell
you—and you have no choice but to believe me!—that the predicate for relvar S
(suppliers) is:
Supplier S# is
under contract, is named SNAME, has status STATUS, and is located in city
CITY.
And the predicate for relvar SP (shipments) is:
Supplier S# is able
to supply part P#.
Armed with this information, you now know that, given any
particular "database state"—i.e., the overall value of the database
at any particular time—the tuples in the relation that's the current value of
relvar S represent true propositions of a certain form (and likewise for relvar
SP). And you rely on that knowledge
in a variety of ways—in particular, when you do queries. As a simple example, suppose you form the
join of suppliers and shipments:
S JOIN SP
Then you know that every tuple in the relation that's the
output from that join represents a true proposition of the following form:
Supplier S# is under contract, is named SNAME, has status
STATUS, is located in city CITY, and is able to supply part P#.
In other words, this latter statement is (by definition) the
predicate for the output from the join, and it effectively dictates how you
interpret the result of the query. In
fact, it's precisely because you understand that the result of the join is to
be interpreted in this way that you formulated the query the way you did. The natural-language formulation of the
query is "Get supplier number, name, status, city, and part number for
each supplier under contract and each part supplied by the supplier in
question."
By way of another example, suppose you restrict suppliers to
just the ones in Paris ("Get supplier number, name, status, and city for
suppliers under contract and located in Paris"):
S WHERE CITY = 'Paris'
The predicate for the output is:
Supplier S# is
under contract, is named SNAME, has status STATUS, is located in city CITY, and
the CITY is Paris.
Of course, there's little point in retaining the CITY
attribute in the output in this example, because we know its value is Paris in
every tuple. So a more reasonable form
of the query is the following, which projects away the CITY attribute
("Get supplier number, name, and status—but not city—for suppliers under
contract and located in Paris"):
( S WHERE CITY = 'Paris' ) { ALL BUT CITY }
And the predicate for the output is, obviously enough:
Supplier S# is
under contract, is named SNAME, has status STATUS, and is located in
Paris.
What I'm trying to show by these examples is that for every
relational operation (join, restrict, project, and so on), given the
predicate(s) satisfied by the tuples of the input relation(s), there's a
well-defined predicate that's satisfied by the tuples of the output relation—and
that "output predicate" is determined from the "input
predicate(s)" by the semantics of the operation in question, as the
examples illustrate.
Now, the particular case you ask about is the projection of
suppliers on all but CITY ("Get supplier number, name, and status for
suppliers under contract"):
S { ALL BUT CITY }
Adopting an obvious simplified notation for tuples, it's
clear that the tuple <s#,sn,st> appears in the output here if and
only if the tuple <s#,sn,st,sc> appears in the input for some city
sc. So the output predicate
is:
Supplier S# is
under contract, is named SNAME, has status STATUS, and is located in some
(unspecified) city sc.
Now, you object that you "don't see how it is correct to
make any mention of the city in [this predicate], as it is not part of the
resulting relation." You further
ask why we don't include further arbitrary terms such as "... and is
located at some address, and has some person as the CEO, and employs some
number of staff" (and so on).
However, later you say: "Do
[we] mention the city only because the projection is derived from a [relation
that] contains a CITY attribute?"
I hope you now see that the answer to this latter question is yes. The output predicate depends on the input
predicate (or input predicates, plural, but there's only one in the
example). If the input predicate had
mentioned the CEO (for example), the output predicate would have had to do so
too.
You also say:
"Imagine we had a suppliers relvar [that] is the same as [relvar] S
but with no CITY attribute ... This is the same as the result of the projection
of S on all but CITY, yet I believe the correct ... predicate is: Supplier S# is under contract, is named
SNAME, and has status STATUS" (forgive my minor paraphrasing of your
original text). I'm afraid you're under
a serious misconception here. This
latter suppliers relvar is NOT "the same as the projection of
relvar S on all but CITY." I agree
it has the same attributes; it might even have the same tuples (at the time
under discussion, possibly even at all times); but it does NOT follow
that the two are the same. It
depends on the predicates; in fact, they would be the same if and only if
they had the same predicates. But we've
already agreed (I hope!—see the VACATIONS example earlier) that the predicate
for a given relvar is whatever the definer says it is—it is not
innate. Thus, you're at liberty to say
the predicate for your revised suppliers relvar is what you say it is ("Supplier S# is under contract, is
named SNAME, and has status STATUS"); but even if you do, there's
no contradiction or inconsistency with my claim to the effect that the
predicate for the projection of my suppliers relvar on all but CITY is
what I say it is ("Supplier
S# is under contract, is named SNAME, has status STATUS, and is located
somewhere").
Of course, you're also at liberty to say the predicate for
your revised suppliers relvar is something entirely different—perhaps "Supplier S# is a friend of mine, has
a cat called SNAME, and lives STATUS miles outside town." Still no contradiction. (Though it's probably bad design—but this
latter objection is likely to be the case no matter what the intended
interpretation is for your revised suppliers relvar.* However, that's a topic for another
day.)
----------
*
I'm assuming here that we're talking about a design that includes both
your revised suppliers relvar and my original one. If it includes just one of the two, there's no problem.
----------
I very much hope the foregoing discussion clarifies matters;
but if not, then I encourage you to keep asking questions. As I said at the outset, the issue is
important, and it's worth the effort on both our parts to get things
straight. Thanks again for your
original questions.
PS: You say "In
other examples in the same section [i.e., Section 1.4 of our book TEMPORAL DATA AND THE RELATIONAL
MODEL], facts about the supplier that are not part of the relation are
not mentioned in the relation predicate."
I can't see where we do this, and I hope we don't. Can you be more specific?
Posted 3/25/05