ABSTRACT

Random sampling from a large, well-defined population or universe is a formal requirement for the usual interpretation of all commonly used parametric and nonparametric statistical inferential tests such as the chi-squared test for association, t test, and ANOVA. It is also often the justification for a claim of generalizability or external validity. However, usually it is difficult or prohibitively expensive even to define and list the population of interest, a prerequisite of random sampling. As an example of the difficulty of definition, consider the population of “households.” Do a landlord and the student lodger doing his or her own shopping and cooking constitute two households or one? What if the landlord provides the student with an evening meal? How many households are there in student flats where they all have their own rooms and share some common areas? And how can the households of interest be listed? Only if we have a list can we take a random sample, and even then it may be difficult. All the full-time students at a university will be on a list and even have a unique registration number, but the numbers will not usually form a continuous series, so random sampling will require the allocation of new numbers. This kind of exercise is usually prohibitively expensive in time, effort, and money, so it is not surprising that samples used in experiments are rarely random samples from any population.