Using Sample Slopes to Talk about Populations: Inference and Regressio

ABSTRACT

Chapter 4 included a distribution of hundreds of chi-square values. Chapter 5 had a distribution of hundreds of sample means. Chapter 6 involved a sampling distribution of sample mean differences. Here, I do roughly the same thing, but instead of chisquare values, or sample means, or differences between sample means, my building blocks will be sample slopes. I went back to my original hypothetical dataset of 100 grades I used in previous chapters. To each student’s grade, I added another piece of information: the percentage of classes that student attended during the semester. For example, here are ten of the 100 cases:

I did this for all 100 cases in the population. Then, using the entire population, I calculated a regression equation:

GRADE = 1.274 + 0.018(ATTENDANCE)

Therefore, 0.018 is the population slope: in the population, for every additional percentage of classes a student attended, his or her grade went up by 0.018. However, just as I did in Chapter 5, I’m going to draw small samples again and again, this time with a sample size of 10. I did this 150 times: I had SPSS draw a random sample of ten cases from this population of 100, and then I had it calculate the regression equation using only these 10 cases. If you’re seeing the similarities to what happened in Chapter 5, you already know what’s going to happen next. Because of sampling error (especially with these small samples), the resulting regression equations differed widely. Here are 3 of the 150 equations:

■ Exhibit 8.1: Percentage of Classes Attended and Grades: 10 Cases

Percentage of Classes Attended Grade

10 1.0 40 1.6 60 2.0 50 2.2 71 2.5 90 2.7 76 3.0 85 3.2 97 3.6

100 3.9

GRADE = 1.317 + 0.018(ATTENDANCE) GRADE = 0.544 + 0.027(ATTENDANCE) GRADE = 1.931 + 0.006(ATTENDANCE)

In the first equation, the sample slope is exactly the same as the population slope. The second sample slope is higher than the population slope. The third slope is lower than the population slope. The lowest of the 150 sample slopes was 0.000. The highest of the 150 sample slopes was 0.041. Here is a graph of all 150 sample slopes:

■ Exhibit 8.2: Sampling Distribution of 150 Sample Slopes

The distribution of these slopes is fairly normally shaped, and would become even more normal if I had taken more and more samples and calculated more and more slopes. Most of the slopes are actually quite close to the population slope of 0.018. Notice that at the center of this distribution is the bar containing this population slope. Some slopes, however, have quite a bit of error, either underestimating or overestimating the slope.

Using Sample Slopes to Talk about Populations: Inference and Regression

ABSTRACT