Friday, June 30, 2017

Data Meaning: Analytics vs. Data Mining



Years back a client consulted me on a problem: they had an important database the meaning of which they did not know. Could the meaning be reconstructed? The question reflected common poor practices due to poor grasp of data fundamentals.

Data mining is distinct from analytics. The former is aimed at 'finding' meaningful data patterns -- i.e., knowledge 'discovery' -- while the latter derives new knowledge from 'existing' knowledge -- i.e., deduction (see Data, Information, Knowledge Discovery, and Knowledge Representation). 'Sensible' querying of databases to retrieve data for analytic applications and correct interpretation of results without a good grasp of data meaning is a fool's errand. Yet current database practices are extremely deficient in this respect.
Data professionals commonly believe they can infer meaning from sheer inspection of the tabular display of database relations, which is an illusion even for simple ones, particularly when, as is so often the case, they are poorly designed. 

Sunday, June 25, 2017

Relations & Relationships Part I

See Relations and the Relational Model

Sunday, June 18, 2017

This Week

1. Database Truth of the Week

"The RDM is a formal system. It has two parts.
  • The Deductive Subsystem: the formal language
  • The Interpretation Subsystem i.e., the application--of that language
Without an interpretation subsystem there is no possibility of applying the formal system and it remains an abstract game of symbols.
Semantics is about applying the RDM to some subject. In effect, what you do is restrict the power of the abstract formalism so that it is more closely aligned with your intended use. That means you are using constraints to limit the vocabulary to the subject matter (and making it finite and usually fairly small) and restricting the possible interpretations that can be used consistently with the resulting subset of the formalism." --David McGoveran

Sunday, June 11, 2017

What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency


Note: This is a 10/23/17 rewrite of a 7/29/12 post to bring it in line with the McGoveran interpretation [1] of Codd's true RDM.

To understand what's wrong with the picture of two weeks ago, namely:
"If we step back and look at what RDBMS is, we’ll no doubt be able to conclude that, as its name suggests (i.e., Relational Database Management System), it is a system that specializes in managing the data in a relational fashion. Nothing more. Folks, it’s important to keep in mind that it manages the data, not the MEANING of the data! And if you really need a parallel, RDBMS is much more akin to a word processor than to an operating system. A word processor (such as the much maligned MS Word, or a much nicer WordPress, for example) specializes in managing words. It does not specialize in managing the meaning of the words ... So who is then responsible for managing the meaning of the words? It’s the author, who else? Why should we tolerate RDBMS opinions on our data? We’re the masters, RDBMS is the servant, it should shut up and serve. End of discussion." --Alex Bunardzic, Should Database Manage The Meaning?
it helps to consider the quote in the context of another article by the author, "The Myth of Data Integrity", where he reveals that those "DBMS opinions" are constraints (the article has been deleted, but a few comments remain online and are highly recommended for a feel  of the consequences of lack of foundation knowledge).

Thursday, June 1, 2017

Redundancy, Consistency, and Integrity: Derivable Data

Database redundancy can wreak havoc with interpretation of analytics results, but it also poses consistency risks that can affect the correctness of the results themselves. The risks are too underappreciated for effective prevention. Given industry practices, analysts who use databases they did not design, or designed without sufficient foundation knowledge, should be on the alert.