This week the predictive vs. explanatory modeling came up in multiple occasions: First, in a study with an information systems colleague where the goal is to build a predictive application for ranking the most-likely auctions to transact; Then, an example that I gave in class of modeling eBay data in to distinguish competitive from non-competitive auctions. And then, a bunch of conversations with students that followed.
The point that I want to make here, which I did not mention directly in my previous post on this subject, is that the set of PREDICTORS your model will include can be very different if the goal is explanatory vs. predictive. Here’s the eBay example: we have data on a set of auctions from eBay (from publicly available data on eBay.com). For each auction there is information on the product features (e.g., category, new/used), seller’s features (e.g., rating), and auction features (e.g., duration, opening price, closing price).
Explanatory goal: To determine factors that lead auctions to be competitive (i.e., receive more than 1 bid).
Predictive goal: To build a seller-side application that will predict the chances that his/her auction will be competitive.
In the explanatory task, we are likely to include the closing price, hypothesizing that (perhaps) lower priced items are more likely to be competitive. However, for the predictive model we cannot include closing price, because it is not known at the start of the auction! In other words, we are constrained to information that is available at the time of prediction.
Until now I have not found a published focused discussion on predictive modeling vs. building explanatory models. Statistics books tend to focus on explanatory models, whereas machine-learning sources focus on predictive modeling. Has anyone seen such a discussion?