MORE ON KEYS
with Fabian Pascal

 

 

 

From: LD

To: Editor

Date: 09/27/2003

 

You stated something that I was not so sure about when it came to distinguishing between two objects:

 

… either you want to track them individually, in which case you must distinguish between them, so you use a surrogate key;

 

Are you stating here that you would advocate ever distinguishing between two "real" things with a surrogate key?  I would have stated my answer to the question as:

 

If you want/need to track individually, then you must have some piece of data to distinguish them.  In the case of a can of cat food, you need something like the stock number, serial number, etc.  If you cannot find some piece of data, you cannot distinguish between them. 

 

The only case I could see for a surrogate key would be if you were going to label the physical thing with the surrogate key value. Otherwise you are simply tracking multiple instances of a type of object.  In this case a can of cat food.  The can of cat food being the thing, and finally you can note that you have multiple instances of the cat food, each being the same as far as we can tell.  It would seem that the only reasonable solution to this particular type of case would be to have the second solution:

 

… or you don't need to track them individually, then you record the lot with an attribute QUANTITY=2;

 

My question is simple, is it a good idea to model items that cannot be distinguished from one another by any of their "real" attributes as different in the database?

 

 

From: Fabian Pascal

To: LD

 

I don't understand your argument, which sounds too complicated.

 

To track individual items you need a unique identifier. Either there is a natural one--defined in my book as preceding the database--or there isn't, in which case you have to invent one.

 

 

From: LD

 

As for making things too complicated, I often have that effect :)

 

My argument was simply that using an artificial or surrogate key as the only method of providing uniqueness is a very bad idea.  To represent something in a database we need to have some way to uniquely identify it.

 

Take the example of the cat food.  If we want to delete one of the records because we sell a can of food, which one do we credit as being sold, or delete?  Unless the surrogate key is somehow assigned to a particular can, it makes for more work than it should. 

 

 

From: Fabian Pascal

 

I wish there was another way to say this, but I think you're confused in your mind.

 

A surrogate key is not for identification, only for distinction. In the case of the cat food, it does not matter which you delete, you only need to know that there are two and you can delete only one. You cannot tell how many you have without distinction.

 

If it is important which can is which, that means that there is a natural key, so there's no need for a surrogate key.

 

 

From: LD

 

I am not confused, and I understand completely your point of view (a point of view that I do respect.)  However, I disagree that it is a good idea to have multiple rows in a database where the only thing that differentiates them is a key that is made up when there is an obvious way to denote that we have multiple "things" that we are unable to differentiate between otherwise.  The surrogate key is used as part of identification if it is part of the only unique key, which in the case of your example it did.

 

The thing that bothers me the most is the idea that it is this idea that we can modify either of the rows to get the same result.  It does mirror reality in that two cans of cat food are obviously indistinguishable on a shelf, but in every case I have ever seen an inventory of a shelf like that I have never seen it in any format other than Cat food (2 cans).  I will see if I can get a copy of your book and read your entire thoughts on the idea.  Perhaps I am confused in my mind :)

 

Sorry to be such a bother, but I write books and give opinion on newsgroups, and the idea of using a surrogate as part of the only unique key has in my mind been a bad idea.  I hate to give out improper opinions where design is involved.

 

 

From: Fabian Pascal

 

Sorry, you do not. Read the chapter on keys in PRACTICAL ISSUES IN DATABASE MANAGEMENT. I differentiate between identifiability and distinguishability. You are confusing the two.

 

 

Posted 11/21/03

 

 

 

[ABOUT] [QUOTES] [LINKS]