Predictive analytics in the long term

Ten years ago, micro-level prediction the way we know it today, was nearly absent in companies. MBAs learned about data analysis mostly in a requires statistics course, which covered mostly statistical inference and descriptive modeling. At the time, I myself was learning my way into the predictive world, and designed the first Data Mining course at University of Maryland’s Smith School of Business (which is running successfully until today!). When I realized the gap, I started giving talks about the benefits of predictive analytics and its uses. And I’ve designed and taught a bunch of predictive analytics courses/programs around the … Continue reading Predictive analytics in the long term

The use of dummy variables in predictive algorithms

Anyone who has taken a course in statistics that covers linear regression has heard some version of the rule regarding pre-processing categorical predictors with more than two categories and the need to factor them into binary dummy/indicator variables: “If a variable has k levels, you can create only k-1 indicators. You have to choose one of the k categories as a “baseline” and leave out its indicator.” (from Business Statistics by Sharpe, De Veaux & Velleman) Technically, one can easily create k dummy variables for k categories in any software. The reason for not including all k dummies as predictors in a … Continue reading The use of dummy variables in predictive algorithms

Predictive relationships and A/B testing

I recently watched an interesting webinar on Seeking the Magic Optimization Metric: When Complex Relationships Between Predictors Lead You Astray by Kelly Uphoff, manager of experimental analytics at Netflix. The presenter mentioned that Netflix is a heavy user of A/B testing for experimentation, and in this talk focused on the goal of optimizing retention. In ideal A/B testing, the company would test the effect of an intervention of choice (such as displaying a promotion on their website) on retention, by assigning it to a random sample of users, and then comparing retention of the intervention group to that of a control … Continue reading Predictive relationships and A/B testing

Predictive modeling and interventions (why you need post-intervention data)

In the last few months I’ve been involved in nearly 20 data mining projects done by student teams at ISB, as part of the MBA-level course and an executive education program.  All projects relied on real data. One of the data sources was transactional data from a large regional hyper market. While the topics of the projects ranged across a large spectrum of business goals and opportunities for retail, one point in particular struck me as repeating across many projects and in many face-to-face discussions. The use of secondary data (data that were already collected for some purpose) for making … Continue reading Predictive modeling and interventions (why you need post-intervention data)

Trees in pivot table terminology

Recently, I’ve been requested by non-data-mining colleagues to explain how Classification and Regression Trees work. While a detailed explanation with examples exists in my co-authored textbook Data Mining for Business Intelligence, I found that the following explanation worked well with people who are familiar with Excel’s Pivot Tables: Classification tree for predicting vulnerability to famine Suppose the goal is to generate predictions for some variable, numerical or categorical, given a set of predictors. The idea behind trees is to create groups of records with similar profiles in terms of their predictors, and then average the outcome variable of interest to … Continue reading Trees in pivot table terminology

Trading and predictive analytics

I attended today’s class in the course Trading Strategies and Systems offered by Prof Vasant Dhar from NYU Stern School of Business. Luckily, Vasant is offering the elective course here at the Indian School of Business, so no need for transatlantic travel. The topic of this class was the use of news in trading. I won’t disclose any trade secrets (you’ll have to attend the class for that), but here’s my point: Trading is a striking example of the distinction between explanation and prediction. Generally, techniques are based on correlations and on “blackbox” predictive models such as neural nets. In particular, text mining and … Continue reading Trading and predictive analytics

“Predict” or “Forecast”?

What is the difference between “prediction” and “forecasting”? I heard this being asked quite a few times lately. The Predictive Analytics World conference website has a Predictive Analytics Guide page with the following Q&A: How is predictive analytics different from forecasting? Predictive analytics is something else entirely, going beyond standard forecasting by producing a predictive score for each customer or other organizational element. In contrast, forecasting provides overall aggregate estimates, such as the total number of purchases next quarter. For example, forecasting might estimate the total number of ice cream cones to be purchased in a certain region, while predictive analytics tells you which individual … Continue reading “Predict” or “Forecast”?

Analytics: You want to be in Asia

Business Intelligence and Data Mining have become hot buzzwords in the West. Using Google Insights for Search to “see what the world is searching for” (see image below), we can see that the popularity of these two terms seems to have stabilized (if you expand the search to 2007 or earlier, you will see the earlier peak and also that Data Mining was hotter for a while). Click on the image to get to the actual result, with which you can interact directly. There are two very interesting insights from this search result: Looking at the “Regional Interest” for these terms, we see that … Continue reading Analytics: You want to be in Asia