Congratulations to our Smith School’s Fall 2009 “Data Mining for Business” students. I look forward to hearing about your future endeavors — use data mining to do good! Continue reading My newest batch of graduating data mining MBAs
In surveys and polls it is common to use stratified sampling. Stratified sampling is also used in data mining, when drawing a sample from a database (for the purpose of model building). This post follows an active discussion about stratification that we had in the “Scientific Data Collection” PhD class. Although stratified sampling is very useful in practice, the explanation of why to do it and how to do it usefully is not straightforward; this stuff is only briefly touched upon in basic stats courses. Looking at the current Wikipedia entry further supports the knowledge gap. What is stratifying? (that’s … Continue reading Stratified sampling: why and how?