Friday, June 30, 2017

Data Meaning: Analytics vs. Data Mining


My July post @All Analytics

Data mining is distinct from analytics. The former is aimed at ‘finding’ meaningful data patterns—i.e., knowledge ‘discovery’—while the latter derives new knowledge from ‘existing’ knowledge—i.e., deduction (see Data, Information, Knowledge Discovery, and Knowledge Representation). ‘Sensible’ querying of databases to retrieve data for analytic applications and correct interpretation of results without a good grasp of data meaning is a fool's errand. Yet current database practices are extremely deficient in this respect.

Sunday, June 25, 2017

Relations & Relationships Part I


This is a 06/18/17 rewrite of a 04/14/13 to bring it in line with the McGoveran interpretation of Codd's RDM[1].

Similarly, as we have explained, some object group properties arise from relationships among individual properties of its members and/or among the members themselves[4]. And a relationship can be modeled as an object, with properties of its own. For example, the relationship between a supplier and a part can be modeled as an object--a relationship object, with quantity as a property. On the other hand, a relationship can also be perceived as a property. For example, the relationships between supplies and suppliers and parts can be modeled with referencing properties. The distinction that the Entity/Relationship Model (E/RM) approach makes is unnecessary and contributes to the confusion[5]. Therefore, Kent should not be interpreted to mean that modeling choices, once made, are confusing, but rather that they are "in the eye of the beholder", so to speak--there is usefulness, not correctness. (Kent's book is a must-read Recommended Book). Otherwise put, make the most useful choice and strive for parsimony, explicitness, well-definedness, and consistency.

Read it all 



Sunday, June 18, 2017

This Week

1. Database Truth of the Week

"The RDM is a formal system. It has two parts.
  • The Deductive Subsystem: the formal language
  • The Interpretation Subsystem i.e., the application--of that language
Without an interpretation subsystem there is no possibility of applying the formal system and it remains an abstract game of symbols.
Semantics is about applying the RDM to some subject. In effect, what you do is restrict the power of the abstract formalism so that it is more closely aligned with your intended use. That means you are using constraints to limit the vocabulary to the subject matter (and making it finite and usually fairly small) and restricting the possible interpretations that can be used consistently with the resulting subset of the formalism." --David McGoveran

Sunday, June 11, 2017

What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency

Note: After the initial posting I corrected some errors. Readers of the initial post early Sunday should re-read.

Whether we like it or not,
"All semantics that can be formalized in FOPL--including verbs such as ‘supplies’—formalize as constraints. We use constraints to reduce the number of permissible interpretations—meanings—of the purely abstract FOPL deductive system. Codd did bring integrity to the forefront by making it an explicit component of a formal data model, but that does not mean that all the semantics that can be captured formally, have been captured in the standard description of RDM, let alone supported in products. In 1979 Codd described a way to "capture" semantics using the relational formalism beyond the then current understanding. That formalism doesn't tell you how to discover semantics, but if you have them, he shows (at least to some degree) how to express those semantics relationally." --David McGoveran
Read it all. 




Thursday, June 1, 2017

Redundancy, Consistency, and Integrity: Derivable Data

My May post @All Analytics

Database redundancy can wreak havoc with interpretation of analytics results, but it also poses consistency risks that can affect the correctness of the results themselves. The risks are too underappreciated for effective prevention. Given industry practices, analysts who use databases they did not design, or designed without sufficient foundation knowledge, should be on the alert.


Read it all (and please comment there, not here).