|Image from www.lumaxart.com|
The 2010 INFORMS Data Mining Contest is underway. This time the goal is to predict 5-minute stock prices. That’s right – forecasting stock prices! In my view, the meta-contest is going to be the most interesting part. By meta-contest I mean looking beyond the winning result (what method, what prediction accuracy) and examining the distribution of prediction accuracies across all the contestants, how the winner is chosen, and most importantly, how the winning result will be interpreted in terms of concluding about the predictability level of stocks.
Why is a stock prediction competition interesting? Because according to the Efficient Market Hypothesis (EMH), stocks and other traded assets are random walks (no autocorrelation between consecutive price jumps). In other words, they are unpredictable. Even if there is a low level of autocorrelation, then the bid-offer spread and transaction costs make stock predictions worthless. I’ve been fascinated with how quickly and drastically the Wikipedia page on the Efficient Market Hypothesis has changed in the last years (see the page history). The proponents of the EMH seem to be competing with its opponents in revising the page. As of today, the opponents are ahead in terms of editing the page — perhaps the recent crisis is giving them an advantage.
The contest’s evaluation page explains that the goal is to forecast whether the stock price will increase or decrease in the next time period. Then, entries will be evaluated in terms of the average AUC (area under the ROC curve). Defining the problem as a binary prediction problem and using the AUC to evaluate the results adds an additional challenge: the average AUC has various flaws in terms of measuring predictive accuracy. In a recent article in the journal Machine Learning, the well-known statistician Prof David Hand shows that in addition to other deficiencies “…the AUC uses different misclassification cost distributions for different classifiers.”
In any case, among the many participants in the competition there is going to be a winner. And that winner will have the highest prediction accuracy for that stock, at least in the sense of average AUC. No uncertainty about that. But will that mean that the winning method is the magic bullet for traders? Most likely not. Or, at least, I would not be convinced until I saw the method consistently outperform a random walk across a large number of stocks and different time periods. For one, I would want to see the distribution of results of the entire set of participants and compare it to a naive classifier to evaluate how “lucky” the winner was.
The competition page reads: The results of this contest could have a big impact on the finance industry. I find that quite scary, given the limited scope of the data, the evaluation metric, and the focus on the top results rather than the entire distribution.