ON DUPLICATES
with Chris Date

 

 

 

From: AS
To: Editor

 

Date has a challenge in Double Trouble, Double Trouble Part 2:

 

"[...] tell me exactly--exactly, please!--how you propose to count those 100 'duplicate' pennies. I do think that anyone who advocates the position that duplicates are a good idea needs to provide a good, convincing answer to this question. It's nearly ten years since I first issued this challenge, and nobody has yet come up with a cogent response to it".

 

1.      Weigh a penny.

2.      Weigh the bag.

3.      Weigh the bag of pennies.

 

I haven't seen this challenge as originally issued, but obviously it must have actually referred to "duplicates" of nonphysical things.
 

Chris Date Responds: First, it was good of AS to give me an out in his final sentence--(that my challenge must have referred to "duplicates of nonphysical things", or in other words, to things for which the weighing trick does not work. In fact, however, I don't believe I need to appeal to this particular out in order to defend my position. The fact is, I don't find the weighing algorithm convincing at all. Consider the first step: weigh a penny: What does AS mean by "a penny"? Suppose I give him two pennies but assert I am giving him just one. How does he know I am wrong? The answer has to be: by counting! Thus, I submit that he has to be able count pennies in order to be able to execute the first step of his algorithm.
 

Editor Comment: In the referred Chris' article I provided a formulation of the counting problem that conveys it better: "Try to count a pile of pennies by throwing each back into the pile after you count it; this is the equivalent of what duplicate proponents suggest, without realizing it."

 

In Chapter 4 of PRACTICAL ISSUES IN DATABASE MANAGEMENT I provide a more verbose version of Chris' response. What is the distinguishing attribute of otherwise identical entities, such as, say, cake mix boxes? In the real world, we distinguish between such entities visually, by their distinct locations in physical space. The lack of such distinction means there is only one entity! Entities are countable only if they are distinguishable!

 

Since in the real world all entities are so distinguishable, duplicates in the database represent "indistinguishable multiple entities" and are, therefore, an inaccurate representation of reality.

 

In a correct representation, propositions about individual boxes would, therefore, have to include a box identifier, say, a box number, the representative in the database of the visual "this vs. that" distinction in the real world. Such identifiers are represented in the database by surrogate keys.

 

But note carefully that AS's method implies no interest in the individual pennies, only in their count. And as I argue in the mentioned chapter, if individual boxes are of no interest, there should not be rows representing them in the database. One database row for the entity type box, with the count made explicit in a column, is the proper representation. Thus, whether there is interest in individual entities, or only in their count, there is no justification for duplicates in either case.

 

Note also that AS's reference to "nonphysical entities" (cannot be weighed)--is particularly pertinent to database rows. To quote from Chris's Part 1article:
 

"The second point is this. Suppose a given table T does permit duplicates. Then we can't tell the difference between "genuine" duplicates in T and duplicates that arise from errors in data entry operations on T! For example, what happens if the person responsible for data entry unintentionally -- that is, by mistake -- enters the very same row into T twice? (Thanks to Fabian Pascal again for drawing my attention to this problem.)"

 

 

Posted 05/24/02

 

 

 

[ABOUT] [QUOTES] [LINKS]