The American Statistical Association’s store used to sell cool T-shirts with the old-time beggar-statistician question “Got Data?” Today it is much easier to find data, thanks to the Internet. Dozens of student teams taking my data mining course have been able to find data from various sources on the Internet for their team projects. Yet, I often receive queries from colleagues in search of data for their students’ projects. This is especially true for short courses, where students don’t have sufficient time to search and gather data (which is highly educational in itself!).
One solution that I often offer is data from data mining competitions. KDD Cup is a classic, but there are lots of other data mining competitions that make huge amounts of real or realistic data available: past INFORMS Data Mining Contests (2008, 2009, 2010), ENBIS Challenges, and more. Here’s one new competition to add to the list:
The European Network for Business and Industrial Statistics (ENBIS) announced the 2011 Challenge (in collaboration with SAS JMP). The title is “Maximising Click Through Rates on Banner Adverts: Predictive Modeling in the On Line World”. It’s a bit complicated to find the full problem description and data on the ENBIS website (you’ll find yourself clicking-through endless “more” buttons – hopefully these are not data collected for the challenge!), so I linked them up.
It’s time for T-shirts saying “Got Data! Want Knowledge?”