I’ve discussed the uselessness of p-values in very large samples, where even miniscule effects become magnified. This is known as the divergence between practical significance and statistical significance.
An interesting article in the most recent issue of The American Statistician describes another dangerous pitfall in using p-values. In their article The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant, Andrew Gelman (a serious blogger himself!) and Hal Stern warn that the comparison of p-values to one another for the purpose of discerning a difference between the corresponding effects (or parameters) is erroneous.
Consider, for example, fitting the following regression model to data:
Sales = beta0 + beta1 TVAdvertising + beta2 WebAdvertising
(say, Sales are in thousands of $, and advertising is in $. )
Let’s assume that we get the following coefficient table:
Coef (std err) p-value
TVAds 3 (1) 0.003
WebAds 1 (1) 0.317
We would reach the conclusion (at, say, a 5% significance level) that TVAds contribute significantly to sales revenue (after accounting for WebAds), and that WebAds do not contribute significantly to sales (after accounting for TVAds). Could we therefore conclude from these two opposite significance conclusions that the difference between the effects of TVAds and WebAds is significant? The answer is NO!
To compare the effects of TVads directly to WebAds, we would use the statistic:
T = (3-1) / (1^2 + 1^2) = 1
The p-value for this statistics is 0.317, which indicates that the difference between the coefficients of TVAds and WebAds is not statistically significant (at the same 5% level).
The authors give two more empirical examples that illustrate this phenomenon. There is no real solution rather than to keep this anomaly in mind!