The Conservatism of Permutation Tests | 12

ABSTRACT

As discussed in Chapter 2, the randomization model is often more appropriate

than the population model, as it might be for a randomized clinical trial, which

is usually based on a “convenience sample” rather than a random sample.

In that case, a permutation test is the “platinum standard” (Tukey, 1993).

However, in practice there are situations where a permutation test is not

performed although it is doable and appropriate. Berger (2009) discussed and

criticized this in a Socratic dialogue where Socrates asked: “If you can observe

the exact p-value, then why would you go on to attempt to approximate it?”

Permutation tests also have disadvantages. On the one hand, permutation

tests are computer intensive, as there are a huge number of possible permuta-

tions in the case of large samples. Although this point is more pronounced in

the case of more than two groups (see Chapter 9), it is also relevant for the

two-sample problem. For instance, for n1 = n2 = 20, there are more than 137

billion permutations (1 billion is defined here as 109). Obviously, bootstrap

methods are computer intensive too. However, the disadvantage is declining

over time. Very efficient algorithms were developed (see, e.g., Good, 2000,

Chapter 13). Moreover, advances in computer power are huge. Modern PCs

probably were unimaginable for R. A. Fisher when he invented permutation

tests in the 1930s. In addition, there is the possibility of performing approxi-

mate permutation tests based on a random sample of permutations.