Linear regression for a binary outcome: is it Kosher?

Regression models are the most popular tool for modeling the relationship between an outcome and a set of inputs. Models can be used for descriptive, causal-explanatory, and predictive goals (but in very different ways! see Shmueli 2010 for more). The family of regression models includes two especially popular members: linear regression and logistic regression (with probit regression more popular than logistic in some research areas). Common knowledge, as taught in statistics courses, is: use linear regression for a continuous outcome and logistic regression for a binary or categorical outcome. But why not use linear regression for a binary outcome? the … Continue reading Linear regression for a binary outcome: is it Kosher?

Policy-changing results or artifacts of big data?

The New York Times article Big Study Links Good Teachers to Lasting Gain covers a research study coming out of Harvard and Columbia on “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood“. The authors used sophisticated econometric models applied to data from a million students to conclude: “We find that students assigned to higher VA [Value-Added] teachers are more successful in many dimensions. They are more likely to attend college, earn higher salaries, live in better neighborhoods, and save more for retirement. They are also less likely to have children as teenagers.” When I see social scientists using statistical … Continue reading Policy-changing results or artifacts of big data?