Friday, March 25, 2016

Not Worth Repeating: Duplicates

My March post @All Analytics.

Frequent hits are driven by the question “Are keys mandatory?” Puzzlingly, many data professionals do not seem to understand why duplicates should be prohibited. This should worry analysts. But  “Stating the same fact more than once, does not make it truer, only redundant,” as E. F. Codd used to say. The absence of an identifier means that individual entities are not meaningful, so this representation contradicts the real world. Contradictions produce problems. First, a DBMS is incapable of “visually” discerning a data entry duplication error from "valid" duplicates, which means high risk of inconsistent databases and wrong counts and other query results.

