Stratified sampling: why and how?

In surveys and polls it is common to use stratified sampling. Stratified sampling is also used in data mining, when drawing a sample from a database (for the purpose of model building). This post follows an active discussion about stratification that we had in the “Scientific Data Collection” PhD class. Although stratified sampling is very useful in practice, the explanation of why to do it and how to do it usefully is not straightforward; this stuff is only briefly touched upon in basic stats courses. Looking at the current Wikipedia entry further supports the knowledge gap. What is stratifying? (that’s … Continue reading Stratified sampling: why and how?

The magical sample size in polls

Now that political polls are a hot item, it is time to unveil the mysterious sentence that accompanies many public opinion polls (not only political) — This typically reads “the poll included 1033 adults and has a sampling error of plus or minus three percentage points”. No matter what population is being sampled, the sample size is typically around 1,000 and the precision is almost always “ۭ±3%” (this is called the margin of error). If you type “poll” in Google you will find plenty of examples. One example is the Jan 2, 2007 NYT Business section article “Investors Greet New … Continue reading The magical sample size in polls