From: Karen Simmons
Date: Sep 15 2005
I came across this paragraph in an online article 12 Tips
for Generating Rich Data at
destinationcrm.com (emphasis mine.)
3) Clean your data regularly.
There are many kinds of dirty data. Some of the most basic--having
multiple entries for the same customer or misspellings--can be the most
labor-intensive to remove. Other cleansing issues stem from organizational
problems. Your marketing department might classify data one way with one naming
convention, while your sales department uses another. But it all goes back
to policies: Require all users to input data the same way, and clean data
often, deleting mistakes and duplicates.
Am I wrong in thinking that if your database is set up
properly, then your users can't classify data willy-nilly?
Therefore you won't have to clean your data,
deleting mistakes and duplicates? From
what I've seen, policies (in and of themselves) don't ensure data integrity.
There's more, but the above blurb was what jumped out at me.
From: Fabian Pascal
But of course. But that requires to think upfront and
do design. And that's hard for the ignorami who can't think [and for whom it is
easier to believe that you can do away with design].
Ed. Note: In
practice, there is more to it. First, most SQL products have poor integrity
support, which means that many constraints must be implemented in application
code. This is prone to error, and often too prohibitive. It's exacerbated by
poor (undernormalized) design, which introduces redundancy and makes the
integrity burden exponentially more prohibitive, which is almost never
addressed. In fact, most practitioners are oblivious to the increased integrity
risks of denormalized designs, and don't bother to address them.
Posted 10/28/05