I find it illuminating to read statistics “bibles” in various fields, which not only open my eyes to different domains, but also present the statistical approach and methods somewhat differently and considering unique domain-specific issues that cause “hmmmm” moments. The 4th edition of Fundamentals of Clinical Trials, whose authors combine extensive practical experience at NIH and in academia, is full of hmmm moments. In one, the authors mention an important issue related to sampling that I have not encountered in other fields. In clinical trials, the gold standard is to allocate participants to either an intervention or a non-intervention (baseline) … Continue reading Statistical considerations and psychological effects in clinical trials
Image from KDnuggets.com While debates over privacy issues related to electronic health records are still ongoing, predictive analytics are beginning to being used with administrative health data (available to health insurance companies, aka, “health provider networks”). One such venue are large data mining contests. Let me describe a few and then get to my point about their contribution to pubic health, medicine and to data mining research. The latest and grandest is the ongoing $3 million prize contest by Hereitage Provider Network, which opened in 2010 and lasts 2 years. The contest’s stated goal is to create “an algorithm that … Continue reading Mining health-related data: How to benefit scientific research
Multiple testing (or multiple comparisons) arises when multiple hypotheses are tested using the same dataset via statistical inference. If each test has false alert level α, then the combined false alert rate of testing k hypotheses (also called the “overall type I error rate”) can be as large as 1-(1-α)^k (exponential in the number of hypotheses k). This is a serious problem and ignoring it can lead to false discoveries. See an earlier post with links to examples. There are various proposed corrections for multiple testing, the most basic principle being reducing the individual α’s. However, the various corrections suffer in this way … Continue reading Multiple testing with large samples
What is the difference between “prediction” and “forecasting”? I heard this being asked quite a few times lately. The Predictive Analytics World conference website has a Predictive Analytics Guide page with the following Q&A: How is predictive analytics different from forecasting? Predictive analytics is something else entirely, going beyond standard forecasting by producing a predictive score for each customer or other organizational element. In contrast, forecasting provides overall aggregate estimates, such as the total number of purchases next quarter. For example, forecasting might estimate the total number of ice cream cones to be purchased in a certain region, while predictive analytics tells you which individual … Continue reading “Predict” or “Forecast”?