From: LD
To: Editor
Date: 09/27/2003
You stated something that I was not so sure about when it
came to distinguishing between two objects:
… either you want to track them
individually, in which case you must distinguish between them, so you use a
surrogate key;
Are you stating here that you would advocate ever
distinguishing between two "real" things with a surrogate key? I would have stated my answer to the
question as:
If you want/need to track individually,
then you must have some piece of data to distinguish them. In the case of a can of cat food, you need
something like the stock number, serial number, etc. If you cannot find some piece of data, you cannot distinguish
between them.
The only case I could see for a surrogate key would be if you
were going to label the physical thing with the surrogate key value. Otherwise
you are simply tracking multiple instances of a type of object. In this case a can of cat food. The can of cat food being the thing, and
finally you can note that you have multiple instances of the cat food, each
being the same as far as we can tell.
It would seem that the only reasonable solution to this particular type
of case would be to have the second solution:
… or you don't need to track them
individually, then you record the lot with an attribute QUANTITY=2;
My question is simple, is it a good idea to model items that
cannot be distinguished from one another by any of their "real"
attributes as different in the database?
From: Fabian Pascal
To: LD
I don't understand your argument, which sounds too
complicated.
To track individual items you need a unique identifier.
Either there is a natural one--defined in my book as preceding the database--or
there isn't, in which case you have to invent one.
From: LD
As for making things too complicated, I often have that
effect :)
My argument was simply that using an artificial or surrogate
key as the only method of providing uniqueness is a very bad idea. To represent something in a database we need
to have some way to uniquely identify it.
Take the example of the cat food. If we want to delete one of the records because we sell a can of
food, which one do we credit as being sold, or delete? Unless the surrogate key is somehow assigned
to a particular can, it makes for more work than it should.
From: Fabian Pascal
I wish there was another way to say this, but I think you're
confused in your mind.
A surrogate key is not for identification, only for distinction.
In the case of the cat food, it does not matter which you delete, you
only need to know that there are two and you can delete only one. You cannot
tell how many you have without distinction.
If it is important which can is which, that means that
there is a natural key, so there's no need for a surrogate key.
From: LD
I am not confused, and I understand completely your point of
view (a point of view that I do respect.) However, I disagree that it is
a good idea to have multiple rows in a database where the only thing that
differentiates them is a key that is made up when there is an obvious way to
denote that we have multiple "things" that we are unable to
differentiate between otherwise. The surrogate key is used as part of
identification if it is part of the only unique key, which in the case of your
example it did.
The thing that bothers me the most is the idea that it is
this idea that we can modify either of the rows to get the same result.
It does mirror reality in that two cans of cat food are obviously indistinguishable
on a shelf, but in every case I have ever seen an inventory of a shelf
like that I have never seen it in any format other than Cat food (2
cans). I will see if I can get a copy of your book and read your entire
thoughts on the idea. Perhaps I am confused in my mind :)
Sorry to be such a bother, but I write books and give opinion
on newsgroups, and the idea of using a surrogate as part of the only unique key
has in my mind been a bad idea. I hate to give out improper opinions
where design is involved.
From: Fabian Pascal
Sorry, you do not. Read the chapter on keys in PRACTICAL ISSUES IN
DATABASE MANAGEMENT. I differentiate between identifiability and
distinguishability. You are confusing the two.
Posted
11/21/03
[ABOUT]
[QUOTES]
[LINKS]