Friday, January 24, 2014

Causality, Uncertainty & Actionability in Analytics

In analytics, it's much easier to "fish" for correlations and try to explain them post-hoc than to develop mutually exclusive hypotheses up-front and test empirically which one holds.Only the second approach is scientific, though, hence my skepticism about the hype of business analytics as "data science." 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.

Here's a comment from an exchange on LinkedIn (emphasis mine): 
"Aside from those people involved in analytics … pretty much nobody cares about the methods used to create the analysis. Especially not at the top of the tree. What they care about is the result itself -- how does this improve my ROI? What insight does this give me that I didn't have before? How can I use this analysis to add value to my organisation? They certainly care that the methods employed produce an accurate result, but the detail? Nope. Not one jot. Sorry."
How can management ensure that the results are accurate without knowing the methods? The only way, of course, is to hire analysts that guarantee accuracy. This burdens the analysts with the responsibility to apply scientific method and to find ways to convey uncertainties, if any, to management. 

Consider the following scenario, presented by Morton Kamp on his Human Capital blog. An analyst works very hard to correlate training programs with increased profits, and one particular program correlates strongly: 
"You present your findings to the management board. You say, "I have identified this strong correlation between this particular program and bottom line profits BUT as you all know, correlation does not mean causality." 
"So what do you say, should we roll this program out to all of our sales people or not?"
"Well, I can't actually say for sure that attending this training program will make your sales people produce higher profits or if it is the sales people who are very profitable who happen to attend this particular training program for unknown reasons. The only thing I can say is that the two variables appear to correlate." 
The meeting ends and you wonder why your next analytics project doesn't get funded."
In other words, the positive correlation of 
Training program <--> Profit increase  
can be explained by either of the two causal models: 
  • Program attendance --> Sales rep's profitability 
  • Sales rep's profitability --> Program attendance
This is why Kamp recommends that analysts simply pretend that correlation is causality. Instead of tinkering with complex models, he says, just interview five sales managers and see whether they support the hypothesis.
"If they believe that the particular training program has led to significant increase in profits from the sales people attending the program, that's it... I know it is not "correct" or "true" but it is good enough. And most likely you will be right."
 We can argue about the validity and reliability of the interviews, but if you do them, you're not pretending -- you are testing for causality. 

But suppose that, realistically, interviewed managers disagree, or are uncertain about the effect of the program on sales reps. What then? 

Companies have data on profitability and training of sales reps. (If they don't, they have much more serious problems to contend with.) You do some analysis and find that the more profitable sales reps attended the course. Is that evidence for the reverse causality? Would you recommend against training? Is there any kind of evidence that would change your mind

As the initial comment suggests, management is interested not only in the direction of causality but in its financial implications. And when there is uncertainty, in particular, management must consider the cost of a possibly wrong recommendation. In our case, it could be that the first model holds, but you recommend against training, or the first model does not hold, but you recommend training.

Hopefully, all your heavy research effort yielded cost-benefit measures of a correct decision. 

Related posts:

No comments:

Post a Comment

View My Stats