My papers To Explain or To Predict and Predictive Analytics in Information Systems Research contrast the process and uses of predictive modeling and causal-explanatory modeling. I briefly mentioned there a third type of modeling: descriptive. However, I haven’t expanded on how descriptive modeling differs from the other two types (causal explanation and prediction). While descriptive and predictive modeling both share the reliance on correlations, whereas explanatory modeling relies on causality, the former two are in fact different. Descriptive modeling aims to give a parsimonious statistical representation of a distribution or relationship, whereas predictive modeling aims at generating values for new/future observations. … Continue reading Election polls: description vs. prediction
Ten years ago, micro-level prediction the way we know it today, was nearly absent in companies. MBAs learned about data analysis mostly in a requires statistics course, which covered mostly statistical inference and descriptive modeling. At the time, I myself was learning my way into the predictive world, and designed the first Data Mining course at University of Maryland’s Smith School of Business (which is running successfully until today!). When I realized the gap, I started giving talks about the benefits of predictive analytics and its uses. And I’ve designed and taught a bunch of predictive analytics courses/programs around the … Continue reading Predictive analytics in the long term
The American Statistical Association published new “Curriculum Guidelines for Undergraduate Programs in Statistical Science“. This is the first update to the guidelines since 2000. The executive summary lists the key points: Increased importance of data science Real applications More diverse models and approaches Ability to communicate This set sounds right on target with what is expected of statisticians in industry (the authors of the report include prominent statisticians in industry). It highlights the current narrow focus of statistics programs as well as their lack of real-world usability. I found three notable mentions in the descriptions of the above points: Point … Continue reading New curriculum design guidelines by American Statistical Association: Who will teach?
I recently watched an interesting webinar on Seeking the Magic Optimization Metric: When Complex Relationships Between Predictors Lead You Astray by Kelly Uphoff, manager of experimental analytics at Netflix. The presenter mentioned that Netflix is a heavy user of A/B testing for experimentation, and in this talk focused on the goal of optimizing retention. In ideal A/B testing, the company would test the effect of an intervention of choice (such as displaying a promotion on their website) on retention, by assigning it to a random sample of users, and then comparing retention of the intervention group to that of a control … Continue reading Predictive relationships and A/B testing
Researchers in various fields have been sending me emails and reactions after reading my 2010 paper “To Explain or To Predict?“. While I am aware of research methodology in a few areas, I’m learning in more detail about the scientific challenges caused by “predictive-less” areas. In an effort to further disseminate this knowledge, I’ll be posting these reactions in this blog (with the senders’ approval, of course). In a recent email, Stan Young, Assistant Director for Bioinformatics at NISS, commented about the explain/predict situation in epidemiology: “I enjoyed reading your paper… I am interested in what I think is [epidemiologists] lack … Continue reading Explain/Predict in Epidemiology
I recently attended the 8th World Congress in Probability and Statistics, where I heard an interesting talk by Andy Tsao. His talk “Naivity can be good: a theoretical study of naive regression” (Abstract #0586) was about the use of Naive Regression, which is the application of linear regression to a categorical outcome, treating the outcome as numerical. He asserted that predictions from Naive Regression will be quite good. My last post was about the “goodness” of a linear regression applied to a binary outcome in terms of the estimated coefficients. That’s what explanatory modeling is about. What Dr. Tsao alerted me to, … Continue reading Linear regression for binary outcome: even better news
Regression models are the most popular tool for modeling the relationship between an outcome and a set of inputs. Models can be used for descriptive, causal-explanatory, and predictive goals (but in very different ways! see Shmueli 2010 for more). The family of regression models includes two especially popular members: linear regression and logistic regression (with probit regression more popular than logistic in some research areas). Common knowledge, as taught in statistics courses, is: use linear regression for a continuous outcome and logistic regression for a binary or categorical outcome. But why not use linear regression for a binary outcome? the … Continue reading Linear regression for a binary outcome: is it Kosher?
Some time ago, when I presented the “explain or predict” work, my colleague Avi Gal asked where simulation falls. Simulation is a key method in operations research, as well as in statistics. A related question arose in my mind when thinking of Scott Nestler‘s distinction between descriptive/predictive/prescriptive analytics. Scott defines prescriptive analytics as “what should happen in the future? (optimization, simulation)“. So where does simulation fall? Does it fall in a completely different goal category, or can it be part of the explain/predict/describe framework? My opinion is that simulation, like other data analytics techniques, does not define a goal in … Continue reading Explain or predict: simulation
I attended today’s class in the course Trading Strategies and Systems offered by Prof Vasant Dhar from NYU Stern School of Business. Luckily, Vasant is offering the elective course here at the Indian School of Business, so no need for transatlantic travel. The topic of this class was the use of news in trading. I won’t disclose any trade secrets (you’ll have to attend the class for that), but here’s my point: Trading is a striking example of the distinction between explanation and prediction. Generally, techniques are based on correlations and on “blackbox” predictive models such as neural nets. In particular, text mining and … Continue reading Trading and predictive analytics
Quite a few of my social science colleagues think that predictive modeling is not a kosher tool for theory building. In our 2011 MISQ paper “Predictive Analytics in Information Systems Research” we argue that predictive modeling has a critical role to play not only in theory testing but also in theory building. How does it work? Here’s an interesting example: The new book The Secret Life of Pronouns by the cognitive psychologist Pennebaker is a fascinating read in many ways. The book describes how analysis of written language can be predictive of psychological state. In particular, the author describes an … Continue reading Language and psychological state: explain or predict?