My colleague Ralph Russo often comes up with memorable examples for teaching complicated concepts. He recently sent me an Economist article called “Signs of the Times” that shows the absurd results that can be obtained if multiple testing is not taken into account.
Multiple testing arises when the same data are used simultaneously for testing many hypotheses. The problem is a huge inflation in the type I error (i.e., rejecting the null hypothesis in error). Even if each single hypothesis is carried out at a low significance level (e.g., the infamous 5% level), the aggregate type I error becomes huge very fast. In fact, if testing k hypotheses that are independent of each other, each at significance level alpha, then the total type I error is 1-(1-alpha)^k. That’s right – it grows exponentially. For example, if we test 7 independent hypotheses at a 10% significance level, the overall type I error is 52%. In other words, even if none of these hypotheses are true, we will see on average more than half of the p-values below 10%.
In the Economist article, Dr. Austin tests a set of multiple absurd “medical” hypotheses (such as “people born under the astrological sign of Leo are 15% more likely to be admitted to hospital with gastric bleeding than those born under the other 11 signs”). He shows that some of these hypotheses are “supported by the data”, if we ignore multiple testing.
There is a variety of solutions for multiple testing, some older (such as the classic Bonferonni correction) and some more recent (such as the False Discovery Rate). But most importantly, this issue should be recognized.