Insights from the Netflix contest

The neat recent Wall Street Journal article Netflix Aims to Refine Art of Picking Films (Nov 20, 2007) was sent to me by Moshe Cohen, one of my dedicated ex-data-mining-course students. In the article, a spokesman from Netflix demystifies some of the winning techniques in the Netflix $1 million contest. OK, not really demystifying, but revealing two interesting insights: 1) Some teams joined forces by combining their predictions to obtain improved predictions (without disclosing their actual algorithms to each other). Today, for instance, the third best team on the Netflix Leaderboard is “When Gravity and Dinosaurs Unite”, which is the … Continue reading Insights from the Netflix contest

Good and bad of classification/regression trees

Classification and Regression Trees are great for both explanatory and predictive modeling. Although data driven, they provide transparency about the resulting classifier are are far from being a blackbox. For this reason trees are often in applications that require transparency, such as insurance or credit approvals. Trees are also used during the exploratory phase for the purpose of variable selection: variables that show up at the top layers of the tree are good candidates as “key players”. Trees do not make any distributional assumptions and are also quite robust to outliers. They can nicely capture local pockets of behavior that … Continue reading Good and bad of classification/regression trees