Friday, January 24, 2014

Causality, Uncertainty & Actionability in Analytics

My January Column @All Analytics.

In analytics, it's much easier to "fish" for correlations and try to explain them post-hoc than to develop mutually exclusive hypotheses up-front and test empirically which one holds.Only the second approach is scientific, though, hence my skepticism about the hype of business analytics as "data science."
Read it all. (And please comment there, not here).



Thursday, January 16, 2014

Weekly Update


1.  Quote of the Week
I do not understand your first point. Even a database designed with only 1NF can have integrity if other methods are used to ensure that integrity. [Higher n]ormal forms can guarantee the absence of various integrity issues, but the lack of a normal form does not guarantee the presence of the integrity issue. I remember writing pages of code to do just that in the early days of RDBMS products before normal forms (i.e., referential integrity) was strictly enforced by the DBMS. --LinkedIn.com
Note: My (first) point was that the minimal relational mandate is 1NF, but that full normalization (5NF) is desirable for practical reasons.


2. To Laugh or Cry?
What is difference between storing data in traditional and modern way in database?

3. Online
What is Surrogate Key, why it is used, is it Primary Key?
Apropos my just published paper on keys.


4. Data management is important, after all.
Point-of-sale malware infecting Target found hiding in plain sight
(Note the last sentence in the article).

So is database design
Poor Data Management Blinded Chase to Madoff Fraud

5. And now for something completely different
Looks like a pattern.



Wednesday, January 15, 2014

New: Paper #4 Published


NEW!! THE KEY TO KEYS: A MATTER OF IDENTITY NEW!!
v.1 (January 2014)


Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design and #2, The Costly Illusion: Normalization, Integrity and Performance, in this series, which are both recommended as preamble. 

If entities in the real world did not have identifiers—attributes that capture their identity and uniquely identify them—we would not be able to tell them apart. It follows that an accurate database representation of a business reality must include keys, R-table columns that formally represent the real world identifiers in the database.

Keys and kinds thereof, their necessity, selection, function and properties are too often not well known and understood. This paper
  • Defines and explains the key concept;
  • Explains the function and properties of the various types of key;
  • Describes the criteria for key selection;
  • Specifies what is proper DBMS key support;
  • Assesses SQL's key support;
  • Debunks some common misconceptions about keys.

Table of Contents

Introduction
1. R-tables and Integrity Constraints
2. Keys and Key Constraints
3. Kinds of Keys
3.1. Natural Keys
3.1.1. Candidate and Primary Keys
3.1.2. Simple and Composite Keys
3.2. Surrogate Keys
4. Key Functions
4.1. Duplicate prevention
4.2. Integrity Burden Reduction
4. 3. View Updatability
5. Foreign Keys and Referential Constraints
6. DBMS Key Support
7. Keys in SQL
7.1. SQL and Duplicates
Appendix A: Duplicate removal in SQL
Appendix B: Duplicates and Language Redundancy
Conclusion
References

Sunday, January 12, 2014

Data Fundamentals and Education


Conveying the theoretical foundation of database management to practitioners without losing either precision/rigor, or the audience is a non-trivial endeavor, the skill for, or interest in which very few possess. More often than not one or the other are lost.

In Relations and Relationships I referred to the following LinkedIn exchange:
AT: ... William Kent and his book "Data and Reality" [1978] ... is devoted to "attributes", and (in my words) William confesses that he can not distinct [sic] between "relationships" and "attributes". Thus, the later might be completely redundant. 
GE: This confusion of entities vs. attributes has been with us a long time. Several years ago [a student] ... discovered a paper ... that discussed this dilemma. He proposed calling the thing, which we could not determine whether it was an entity or an attribute, an "entribute." Beautiful, eh? That was in 1939.

Sunday, January 5, 2014

Weekly Update


1.  Quote of the Week
As you might have gathered, I think NoSQL is technology to be taken very seriously. If you have an application problem that maps well to a NoSQL data model - such as aggregates or graphs - then you can avoid the nastiness of mapping completely. Indeed this is often a reason I've heard teams go with a NoSQL solution. This is, I think, a viable route to go - hence my interest in increasing our understanding of NoSQL systems. But even so it only works when the fit between the application model and the NoSQL data model is good ... And of course there are many situations where you're stuck with a relational model anyway. Maybe it's a corporate standard that you can't jump over, maybe you can't persuade your colleagues to accept the risks of an immature technology. In this case you can't avoid the mapping problem.
--martinfowler.com

2. To Laugh or Cry?
Which database type to use (one big database or many smaller)?

3. Online

Check out the last update's exchange--more comments have been posted since:
RELATIONAL DATABASE

4. Elsewhere
Set In Stone

5. And now for something completely different
Multivitamins Doctors Say Stop Taking Them
Half of U.S. adults take vitamins, supplements routinely
Draw your own conclusion.