Webcasts on data mining

Moshe Cohen, a current MBA student in my class, pointed out an interesting set of webcasts on data mining called "Best Practices in Data Mining", by the Insightful corporation. Part I (now archived) describes a few scenarios where data mining is useful in the business context. They show some examples of questions of interest, datasets that are used in such applications, and the analysis process. Of course, their software InsightfulMiner is also showcased. I especially liked the emphasis on data visualization, and the SAS-EM-like "working" chart. They also discuss data preprocessing with some detail on missing values and outliers.

It’s competition season: and now Netflix

An exciting new dataset is out there for us data aficionados! Netflix, the huge movie renter, announced a $1 million prize for the winner of a competition who can improve upon their Cinematch algorithm for predicting movie ratings. The competition started at the beginning of the month and has already created a lot of buzz. The company put out there a huge training set that includes millions of movie ratings. Competing teams can use this dataset to come up with prediction algorithms, and then submit predictions for a test set. The training dataset contain more than 100 million ratings from

Nation’s favorite professors – in statistics???

When introductions are made, and the question comes "so what do you do?" I sheepishly reply "I teach statistics at University of Maryland's business school". The two most popular reactions are(1) a terrified look — "statistics? oh, I had to take that in undergrad!", or(2) a dazed look — "Wow!" [which really means, "I didn't understand any of it, so how did you figure it out?"] But sometimes I do come across people who get all excited and say they took a statistics course and LOVED it. And very often it is attributable to the professor. Indeed, from my own

Time Series Forecasting Competition

Forecasting transportation demand is important for multiple goals such as staffing, planning, and inventory control. The public transportation system in Santiago de Chile is currently going through a major effort of reconstruction (if you read Spanish, you can find more at www.transantiago.cl). The 2006 Business Intelligence Competition (BI CUP 2006) focuses on forecasting demand for public transportation. They provide a training set of a time series of passengers arriving at a terminal, and the competitors must come up with a method for forecasting the test set, which comprises of a few future days. Although this problem is a great example