Early detection of what?

The interest in using pre-diagnostic data for the early detection of disease outbreaks, has evolved in interesting ways in the last 10 years. In the early 2000s, I was involved in an effort to explore the potential of non-traditional data sources, such as over-the-counter pharmacy sales and web searches on medical websites, which might give earlier signs of a disease outbreak than confirmed diagnostic data (lab tests, doctor diagnoses, etc.). The pre-diagnostic data sources that we looked at were not only expected to have an earlier footprint of the outbreak compared to traditional diagnostic data, but they were also collected … Continue reading Early detection of what?

Got Data?!

The American Statistical Association’s store used to sell cool T-shirts with the old-time beggar-statistician question “Got Data?” Today it is much easier to find data, thanks to the Internet. Dozens of student teams taking my data mining course have been able to find data from various sources on the Internet for their team projects. Yet, I often receive queries from colleagues in search of data for their students’ projects. This is especially true for short courses, where students don’t have sufficient time to search and gather data (which is highly educational in itself!). One solution that I often offer is … Continue reading Got Data?!

New data repository by UN

As more government and other agencies move “online”, some actually make their data publicly available. Adi Gadwale, one of my dedicated ex-students, sent a note about a new neat data repository made publicly available by the UN called UNdata. You can read more about it in the UN News bulletin or go directly to repository at http://data.un.org The interface is definitely easy to navigate. Lots of time series for the different countries on many types of measurements. This is a good source of data that can be used to supplement other existing datasets (like one would use US census data … Continue reading New data repository by UN

Source for data

Adi Gadwale, a student in my 2004 MBA Data Mining class, still remembers my fetish with business data and data visualization. He just sent me a link to an IBM Research website called Many Eyes, which includes user-submitted datasets as well as Java-applet visualizations. The datasets include quite a few “junk” datasets, lots with no description. But there are a few interesting ones: FDIC is a “scrubbed list of FDIC institutions removing inactive entities and stripping all columns apart from Assets, ROE, ROA, Offices (Branches), and State”. It includes 8711 observations. Another is Absorption Coefficients of Common Materials – I … Continue reading Source for data