Policy-changing results or artifacts of big data?

The New York Times article Big Study Links Good Teachers to Lasting Gain covers a research study coming out of Harvard and Columbia on “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood“. The authors used sophisticated econometric models applied to data from a million students to conclude:

“We find that students assigned to higher VA [Value-Added] teachers are more successful in many dimensions. They are more likely to attend college, earn higher salaries, live in better neighborhoods, and save more for retirement. They are also less likely to have children as teenagers.”

When I see social scientists using statistical methods in the Big Data realm I tend to get a little suspicious, since classic statistical inference behaves differently with large samples than with small samples (which are more typical in the social sciences). Let’s take a careful look at some of the charts from this paper to figure out the leap from the data to the conclusions.

How much does a “value added” teacher contribute to a person’s salary at age 28?

 Figure 1: dramatic slope? largest difference is less than \$1,000

The slope in the chart (Figure 1) might look quite dramatic. And I can tell you that, statistically speaking, the slope is not zero (it is a “statistically significant” effect). Now look closely at the y-axis amounts. Note that the data fluctuate only by a very small annual amount! (less than \$1,000 per year). The authors get around this embarrassing magnitude by looking at the “lifetime value” of a student (“On average, having such a [high value-added] teacher for one year raises a child’s cumulative lifetime income by \$50,000 (equivalent to \$9,000 in present value at age 12 with a 5% interest rate).”

Here’s another dramatic looking chart:

What happens to the average student test score as a “high value-added teacher enters the school”?

The improvement appears to be huge! But wait, what are those digits on the y-axis? the test score goes up by 0.03 points!

Reading through the slides or paper, you’ll find various mentions of small p-values, which indicate statistical significance (“p<0.001” and similar notations). This by no means says anything about the practical significance or the magnitude of the effects.

If this were a minor study published in a remote journal, I would say “hey, there are lots of those now.” But when a paper covered by the New York Times and is published as in the serious National Bureau of Economic Research Working Paper series (admittedly, not a peer-reviewed journal), then I am worried. I am very worried.

Unless I am missing something critical, I would only agree with one line in the executive summary: “We find that when a high VA teacher joins a school, test scores rise immediately in the grade taught by that teacher; when a high VA teacher leaves, test scores fall.” But with one million records, that’s not a very interesting question. The interesting question which should drive policy is by how much?

Big Data is also becoming the realm in social sciences research. It is critical that researchers are aware of the dangers of applying small-sample statistical models and inference in this new era. Here is one place to start.