In the last few months I’ve been involved in nearly 20 data mining projects done by student teams at ISB, as part of the MBA-level course and an executive education program. All projects relied on real data. One of the data sources was transactional data from a large regional hyper market. While the topics of the projects ranged across a large spectrum of business goals and opportunities for retail, one point in particular struck me as repeating across many projects and in many face-to-face discussions. The use of secondary data (data that were already collected for some purpose) for making … Continue reading Predictive modeling and interventions (why you need post-intervention data)
Surveys are a key data collection tool in several academic research areas. As opposed to experiments or field studies that yield observational data, surveys can give access to attitudes, reaching “inside the head” of people rather than observing their behavior. Technological advances in survey tool development now offer “poor academics” sufficiently powerful online survey tools, such as surveymonkey.com and Google forms. Yet, obtaining access to a large pool of potential respondents from a particular population remains a challenge. Another challenge is getting fast responses — how do you reach people quickly and get many of them to respond quickly? We may … Continue reading New Google Consumer Surveys: revolutionizing academic data collection?
I find it illuminating to read statistics “bibles” in various fields, which not only open my eyes to different domains, but also present the statistical approach and methods somewhat differently and considering unique domain-specific issues that cause “hmmmm” moments. The 4th edition of Fundamentals of Clinical Trials, whose authors combine extensive practical experience at NIH and in academia, is full of hmmm moments. In one, the authors mention an important issue related to sampling that I have not encountered in other fields. In clinical trials, the gold standard is to allocate participants to either an intervention or a non-intervention (baseline) … Continue reading Statistical considerations and psychological effects in clinical trials
Online data are a huge resources for research as well as in practice. Although it is often tempting to “scrape everything” using technologies like web-crawling, it is extremely important to keep the goal of the analysis in mind. Are you trying to build a predictive model? A descriptive model? How will the model be used? Deployed to new records? etc. Dean Tau from Co-soft recently posted an interesting and useful comment in the Linked-in group Data Mining, Statistics, and Data Visualization. With his permission, I am reproducing his post: What you need to do before online data collection? Data colllection … Continue reading Online data collection
This continues my “To Explain or To Predict?” argument (in brief: statistical models aimed at causal explanation will not necessarily be good predictors). And now, I move to a very early stage in the study design: how should we collect data? A well-known notion is that experiments are preferable to observational studies. The main difference between experimental studies and observational studies is an issue of control. In experiments, the researcher can deliberately choose “treatments” and control the assignment of subjects to the “treatments”, and then can measure the outcome. Whereas in observational studies, the researcher can only observe the subjects … Continue reading Are experiments always better?