This year, two important new regulations will be impacting research with human subjects: the EU’s General Data Protection Regulation (GDPR), which kicks in May 2018, and the USA’s updated Common Rule, called the Final Rule, is in effect from Jan 2018. Both changes relate to protecting individuals’ private information and will affect researchers using behavioral data in terms of data collection, access, use, applications for ethics committee (IRB) approvals/exemptions, collaborations within the same country/region and beyond, and collaborations with industry. Both GDPR and the final rule try to modernize what today constitutes “private data” and data subjects’ rights and balance … Continue reading Data Ethics Regulation: Two key updates in 2018
My papers To Explain or To Predict and Predictive Analytics in Information Systems Research contrast the process and uses of predictive modeling and causal-explanatory modeling. I briefly mentioned there a third type of modeling: descriptive. However, I haven’t expanded on how descriptive modeling differs from the other two types (causal explanation and prediction). While descriptive and predictive modeling both share the reliance on correlations, whereas explanatory modeling relies on causality, the former two are in fact different. Descriptive modeling aims to give a parsimonious statistical representation of a distribution or relationship, whereas predictive modeling aims at generating values for new/future observations. … Continue reading Election polls: description vs. prediction
Randomized experiments (or randomized controlled trials, RCT) are a powerful tool for testing causal relationships. Their main principle is random assignment, where subjects or items are assigned randomly to one of the experimental conditions. A classic example is a clinical trial with one or more treatment groups and a no-treatment (control) group, where individuals are assigned at random to one of these groups. Story 1: (Internet) experiments in industry Internet experiments have now become a major activity in giant companies such as Amazon, Google, and Microsoft, in smaller web-based companies, and among academic researchers in management and the social sciences. … Continue reading Key challenges in online experiments: where are the statisticians?
The recent issue of the Journal of Computational Graphics & Statistics published a short article by Columbia Univ Prof Andrew Gelman (I believe he is the most active statistician-blogger) called “Why tables are really much better than graphs” based on his April 1, 2009 blog post (note the difference in publishing speed using blogs and refereed journals!). The last parts made me laugh hysterically – so let me share them: About creating and reporting “good” tables: It’s also helpful in a table to have a minimum of four significant digits. A good choice is often to use the default provided … Continue reading Nice April Fool’s Day prank
I am currently visiting the Indian School of Business (ISB) and enjoying their excellent library. As in my student days, I roam the bookshelves and discover books on topics that I know little, some, or a lot. Reading and leafing through a variety of books, especially across different disciplines, gives some serious points for thought. As a statistician I have the urge to see how statistics is taught and used in other disciplines. I discovered an interesting book coming from the psychology literature by Herman Aguinas called Regression Analysis for Categorical Moderators. “Moderators” in statistician language is “interactions”. However, when … Continue reading Discovering moderated relationship in the era of large samples