The Scientific Value of Testing Predictive Performance

This week’s NY Times article Risk Calculator for Cholesterol Appears Flawed and CNN article Does calculator overstate heart attack risk? illustrate the power of evaluating the predictive performance of a model for purposes of validating the underlying theory. The NYT article describes findings by two Harvard Medical School professors, Ridker and Cook, about extreme over-estimation of the 10-year risk of a heart-attack or stroke when using a calculator released by the American Heart Association and the American College of Cardiology. “According to the new guidelines, if a person’s risk is above 7.5%, he or she should be put on a statin.” (CNN … Continue reading The Scientific Value of Testing Predictive Performance

Predictive relationships and A/B testing

I recently watched an interesting webinar on Seeking the Magic Optimization Metric: When Complex Relationships Between Predictors Lead You Astray by Kelly Uphoff, manager of experimental analytics at Netflix. The presenter mentioned that Netflix is a heavy user of A/B testing for experimentation, and in this talk focused on the goal of optimizing retention. In ideal A/B testing, the company would test the effect of an intervention of choice (such as displaying a promotion on their website) on retention, by assigning it to a random sample of users, and then comparing retention of the intervention group to that of a control … Continue reading Predictive relationships and A/B testing

Forecasting stock prices? The new INFORMS competition

Image from The 2010 INFORMS Data Mining Contest is underway. This time the goal is to predict 5-minute stock prices. That’s right – forecasting stock prices! In my view, the meta-contest is going to be the most interesting part. By meta-contest I mean looking beyond the winning result (what method, what prediction accuracy)  and examining the distribution of prediction accuracies across all the contestants, how the winner is chosen, and most importantly, how the winning result will be interpreted in terms of concluding about the predictability level of stocks. Why is a stock prediction competition interesting? Because according to … Continue reading Forecasting stock prices? The new INFORMS competition

Stratified sampling: why and how?

In surveys and polls it is common to use stratified sampling. Stratified sampling is also used in data mining, when drawing a sample from a database (for the purpose of model building). This post follows an active discussion about stratification that we had in the “Scientific Data Collection” PhD class. Although stratified sampling is very useful in practice, the explanation of why to do it and how to do it usefully is not straightforward; this stuff is only briefly touched upon in basic stats courses. Looking at the current Wikipedia entry further supports the knowledge gap. What is stratifying? (that’s … Continue reading Stratified sampling: why and how?

Are experiments always better?

This continues my “To Explain or To Predict?” argument (in brief: statistical models aimed at causal explanation will not necessarily be good predictors). And now, I move to a very early stage in the study design: how should we collect data? A well-known notion is that experiments are preferable to observational studies. The main difference between experimental studies and observational studies is an issue of control. In experiments, the researcher can deliberately choose “treatments” and control the assignment of subjects to the “treatments”, and then can measure the outcome. Whereas in observational studies, the researcher can only observe the subjects … Continue reading Are experiments always better?

Accuracy measures

There is a host of metrics for evaluating predictive performance. They are all based on aggregating the forecast errors in some form. The two most famous metrics are RMSE (Root-mean-squared-error) and MAPE (Mean-Absolute-Percentage-Error). In an earlier posting (Feb-23-2006) I disclosed a secret deciphering method for computing these metrics. Although these two have been the most popular in software, competitions, and published papers, they have their shortages. One serious flaw of the MAPE is that zero counts contribute to the MAPE the value of infinity (because of the division by zero). One solution is to leave the zero counts out of … Continue reading Accuracy measures