The neat recent Wall Street Journal article Netflix Aims to Refine Art of Picking Films (Nov 20, 2007) was sent to me by Moshe Cohen, one of my dedicated ex-data-mining-course students. In the article, a spokesman from Netflix demystifies some of the winning techniques in the Netflix $1 million contest. OK, not really demystifying, but revealing two interesting insights:
1) Some teams joined forces by combining their predictions to obtain improved predictions (without disclosing their actual algorithms to each other). Today, for instance, the third best team on the Netflix Leaderboard is “When Gravity and Dinosaurs Unite”, which is the result of two teams combining their predictions(Gravity from Hungary and Dinosaur Planet from US). This is an example of the “portfolio approach” which says that combining predictions from a variety of methods (and sometimes a variety of datasets) can lead to higher performance, just like stock portfolios.
2) AT&T, who is currently in the lead, takes an approach that includes 107 different techniques (blended in different ways). You can get a glimpse of these methods in their publicly available document written by Robert Bell, Yehuda Koren, and Chris Volinsky (kudos for the “open-source”!). They use regression models, k-nearest-neighbor methods, collaborative filtering, “portfolios” of the different methods, etc. Again, this shows that “looking” at data from multiple views is usually very beneficial. Like painkillers, a variety is useful because sometimes one works but other times another works better.
Please note that this does NOT suggest that a portfolio approach with painkillers is recommended!