To most researchers and practitioners using statistical inference, the popular hypothesis testing universe consists of two hypotheses: H0 is the null hypothesis of “zero effect” H1 is the alternative hypothesis of “a non-zero effect” The alternative hypothesis (H1) is typically what the researcher is trying to find: a different outcome for a treatment and control group in an experiment, a regression coefficient that is non-zero, etc. Recently, several independent colleagues have asked me if there’s a statistical way to show that an effect is zero, or, that there’s no difference between groups. Can we simply use the above setup? The answer … Continue reading Statistical test for “no difference”
Now that the emotional storm following the American Statistical Association’s statement on p-values is slowing down (is it? was there even a storm outside of the statistics area?), let’s think about a practical issue. One that greatly influences data analysis in most fields: statistical software. Statistical software influences which methods are used and how they are reported. Software companies thus affect entire disciplines and how they progress and communicate. Star notation for p-value thresholds in statistical software No matter whether your field uses SAS, SPSS (now IBM), STATA, or another statistical software package, you’re likely to have seen the star … Continue reading Statistical software should remove *** notation for statistical significance
In its recent editorial, the journal Basic and Applied Social Psychology announced that it will no longer accept papers that use classical statistical inference. No more p-values, t-tests, or even… confidence intervals! “prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about ‘‘significant’’ differences or lack thereof, and so on)… confidence intervals also are banned from BASP” Many statisticians would agree that it is high time to move on from p-values and statistical inference to practical significance, estimation, more elaborate non-parametric modeling, and resampling for avoiding assumption-heavy models. This is especially so now, … Continue reading Psychology journal bans statistical inference; knocks down server
The New York Times article Big Study Links Good Teachers to Lasting Gain covers a research study coming out of Harvard and Columbia on “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood“. The authors used sophisticated econometric models applied to data from a million students to conclude: “We find that students assigned to higher VA [Value-Added] teachers are more successful in many dimensions. They are more likely to attend college, earn higher salaries, live in better neighborhoods, and save more for retirement. They are also less likely to have children as teenagers.” When I see social scientists using statistical … Continue reading Policy-changing results or artifacts of big data?
“Big Data” is a big buzzword. I bet that sentiment analysis of news coverage, blog posts and other social media sources would show a strong positive sentiment associated with Big Data. What exactly is big data depends on who you ask. Some people talk about lots of measurements (what I call “fat data”), others of huge numbers of records (“long data”), and some talk of both. How much is big? Again, depends who you ask. As a statistician who’s (luckily) strayed into data mining, I initially had the traditional knee-jerk reaction of “just get a good sample and get it … Continue reading Big Data: The Big Bad Wolf?
Multiple testing (or multiple comparisons) arises when multiple hypotheses are tested using the same dataset via statistical inference. If each test has false alert level α, then the combined false alert rate of testing k hypotheses (also called the “overall type I error rate”) can be as large as 1-(1-α)^k (exponential in the number of hypotheses k). This is a serious problem and ignoring it can lead to false discoveries. See an earlier post with links to examples. There are various proposed corrections for multiple testing, the most basic principle being reducing the individual α’s. However, the various corrections suffer in this way … Continue reading Multiple testing with large samples
My students know how I cringe when I am forced to teach them p-values. I have always felt that their meaning is hard to grasp, and hence they are mostly abused when used by non-statisticians. This is clearly happening in research using large datasets, where p-values are practically useless for inferring practical importance of effects (check out our latest paper on the subject, which looks at large-dataset research in Information Systems). So, when one of the PhD students taking my “Scientific Data Collection” course stumbled upon this recent Science Magazine article “Mission Improbable: A Concise and Precise Definition of P-Value” … Continue reading The value of p-values: Science magazine asks
I’ve recently had interesting discussions with colleagues in Information Systems regarding testing directional hypotheses. Following their request, I’m posting about this apparently illusive issue. In information systems research, the most common type of hypothesis is directional, i.e. the parameter of interest is hypothesized to go in a certain direction. An example would be testing the hypothesis that teenagers are more likely than older folks to use Facebook. Another example is the hypothesis that higher opening bids on eBay lead to higher final prices. In the Facebook example, the researcher would test the hypothesis by gathering data on Facebook usage by … Continue reading Testing directional hypotheses: p-values can bite