I find it illuminating to read statistics “bibles” in various fields, which not only open my eyes to different domains, but also present the statistical approach and methods somewhat differently and considering unique domain-specific issues that cause “hmmmm” moments.
The 4th edition of Fundamentals of Clinical Trials, whose authors combine extensive practical experience at NIH and in academia, is full of hmmm moments. In one, the authors mention an important issue related to sampling that I have not encountered in other fields. In clinical trials, the gold standard is to allocate participants to either an intervention or a non-intervention (baseline) group randomly, with equal probabilities. In other words, half the participants receive the intervention and the other half does not (the non-intervention can be a placebo, the traditional treatment, etc.) The authors advocate a 50:50 ratio, because “equal allocation is the most powerful design”. While there are reasons to change the ratio in favor of the intervention or baseline groups, equal allocation appears to have an important additional psychological advantage over unequal allocation in clinical trials:
Unequal allocation may indicate to the participants and to their personal physicians that one intervention is preferred over the other (pp. 98-99)
Knowledge of the sample design by the participants and/or the physicians also affects how randomization is carried out. It becomes a game between the designers and the participants and staff, where the two sides have opposing interests: to blur vs. to uncover the group assignments before they are made. This gaming requires devising special randomization methods (which, in turn, require data analysis that takes the randomization mechanism into account).
For example, to assure an equal number of participants in each of the two groups, given that participants enter sequentially, “block randomization” can be used. For instance, to assign 4 people to one of two groups A or B, consider all the possible arrangements AABB, AABA, etc., then choose one sequence at random, and assign participants accordingly. The catch is that if the staff have knowledge that the block size is 4 and know the first three allocations, they automatically know the fourth allocation and can introduce bias by using this knowledge to select every fourth participant.
Where else does such a psychological effect play a role in determining sampling ratios? In applications where participants and other stakeholders have no knowledge of the sampling scheme this is obviously a non-issue. For example, when Amazon or Yahoo! present different information to different users, the users have no idea about the sample design, and maybe not even that they are in an experiment. But how is the randomization achieved? Unless the randomization process is fully automated and not susceptible to reverse engineering, someone in the technical department might decide to favor friends by allocating them to the “better” group…