ABSTRACT

Take a sample, then use the sample results to draw conclusions about the world-well, the particular population at least: As you know, that’s statistical inference, the foundation of most research. Sampling, the first step, is our main business in this chapter. Then, in Chapter 5, we’ll go on to the second step by discussing CIs. In this chapter, we’ll start with more about the normal distribution, and then mostly we’ll discuss samples and sampling. Watch out for the dance of the means, and the mean heap. Here’s the plan for this chapter:

■ Continuous distributions, z scores, and areas and probabilities ■ The normal distribution ■ The population and a random sample ■ Sampling: Dance of the means and the mean heap ■ A measure of sampling variability: The standard error ■ Some statistical magic-the central limit theorem

To understand statistics it helps to appreciate dancing, and the dance of the means is our first dance. It illustrates sampling variability, which is a central idea that’s vital to grasp intuitively. People usually underestimate sampling variability, so they often don’t grasp how much uncertainty there is in data. Here we’ll discuss our measure of sampling variability, the standard error; then, in Chapter 5, we’ll use the standard error to calculate what we really want, the CI.

Figure 3.5 illustrated the smooth curve of the normal distribution and how we can think of such a curve as a pile of an extremely large number of potential data points. The variable X in that figure is a continuous variable, meaning it can take any value in some range-it’s not restricted to taking separate, discrete values. By contrast, the number of eggs in a bag is a discrete variable, which can take only the distinct values 0, 1, 2, …, assuming we have no broken fractions of an egg. The continuous variable X might measure time to carry out a task, or the weight gained by a baby in a month. In practice we can only measure such variables to a certain accuracy-perhaps the weight gain is recorded as 173.2 g or 173.3 g-but, in principle, variables like time, length, and weight can take any (positive) value, so they are continuous.