Predictive analytics in the long term

Ten years ago, micro-level prediction the way we know it today, was nearly absent in companies. MBAs learned about data analysis mostly in a requires statistics course, which covered mostly statistical inference and descriptive modeling. At the time, I myself was learning my way into the predictive world, and designed the first Data Mining course at University of Maryland’s Smith School of Business (which is running successfully until today!). When I realized the gap, I started giving talks about the benefits of predictive analytics and its uses. And I’ve designed and taught a bunch of predictive analytics courses/programs around the … Continue reading Predictive analytics in the long term

The Scientific Value of Testing Predictive Performance

This week’s NY Times article Risk Calculator for Cholesterol Appears Flawed and CNN article Does calculator overstate heart attack risk? illustrate the power of evaluating the predictive performance of a model for purposes of validating the underlying theory. The NYT article describes findings by two Harvard Medical School professors, Ridker and Cook, about extreme over-estimation of the 10-year risk of a heart-attack or stroke when using a calculator released by the American Heart Association and the American College of Cardiology. “According to the new guidelines, if a person’s risk is above 7.5%, he or she should be put on a statin.” (CNN … Continue reading The Scientific Value of Testing Predictive Performance

Forecasting stock prices? The new INFORMS competition

Image from The 2010 INFORMS Data Mining Contest is underway. This time the goal is to predict 5-minute stock prices. That’s right – forecasting stock prices! In my view, the meta-contest is going to be the most interesting part. By meta-contest I mean looking beyond the winning result (what method, what prediction accuracy)  and examining the distribution of prediction accuracies across all the contestants, how the winner is chosen, and most importantly, how the winning result will be interpreted in terms of concluding about the predictability level of stocks. Why is a stock prediction competition interesting? Because according to … Continue reading Forecasting stock prices? The new INFORMS competition

Weighted nearest-neighbors

K-nearest neighbors (k-NN) is a simple yet often powerful classification / prediction method. The basic idea, for predicting a new observation, is to find the k most similar observations in terms of the predictor (X) values, and then let those k neighbors vote to determine the predicted class membership (or take their Y average to predict their numerical outcome). Since this is such an intuitive method, I thought it would be useful to discuss two improvements that have been suggested by data miners. Both use weighting, but in different ways. One intuitive improvement is to weight the neighbors by their … Continue reading Weighted nearest-neighbors