ABSTRACT

In this chapter we start with a brief overview of measurement reliability and validity. Then we demonstrate four statistics that provide evidence for some of the several aspects of the reliability and validity of your data. These statistics are often described in the Method chapter of a research report because they precede the testing of your research hypothesis. Including information about the reliability and validity of your data will help support the answers to your research questions Problem 7.1 demonstrates the use of Cohen’s kappa, which is a good method to assess evidence for interrater reliability of a nominal variable. Problem 7.2 uses correlation and the paired samples t test to illustrate one method to assess interrater reliability for normally distributed (scale) variables. Problem 7.3 uses factor analysis to show how one could reduce several variables to a smaller number of composite variables that represent several theoretically related aspects of a complex concept and provide support for later analyses of the research questions. In Problem 7.4, Cronbach’s alpha is used to provide evidence for the internal consistency reliability of composites, multi-item subscales, or sets of variables resulting from a factor analysis or from a theoretical combination of variables (e.g. several items on a questionnaire designed to measure the same concept). This chapter illustrates how to compute, interpret and write about these statistics in terms of the evidence they provide to support reliability and validity. Problems 7.1 and/or 7.2 will be helpful when you are using observations to “code” specific categories of behavior or whenever there is at least some subjectivity to a classification system. In this case the consistency or reliability of the data needs to be checked by having two observers or raters independently record their scores and then check how well they agree with one another (their interrater or interobserver reliability). It is common for a researcher to develop a smaller number of new summated variables from an initially larger number of items such as the 14 Likert-type ratings we designed to measure attitudes about mathematics motivation, competence, and pleasure. The statistics described in Problems 7.3 and 7.4 are relatively complex but useful both for assessing the reliability and validity of established instruments such as published questionnaires. Even with an established questionnaire, one would check one’s own data, at least for the internal consistency reliability (alpha) of the subconcepts such as the math attitudes of competence, motivation, and pleasure described in Chapter 1. If one were developing a new or modified instrument, one would probably first try to find support for the validity of the theoretical groupings of the items to see how well the factor analysis fits the proposed structure or organization of the subconstructs based on the literature or theory. Then, one would check the reliability of the factors using Cronbach’s alpha, as we have in Problems 7.3 and 7.4. It is important to realize that the statistics performed here are based on, and in some cases, the same as those in Chapters 8-11. The Pearson correlation which was introduced briefly in Chapter 6 (especially in Table 6.2 and research question 2) is used in Problems 7.2-7.4 so it is important to understand a little more about correlation before we introduce the problems in this chapter. (Correlation is discussed more fully in Chapter 9.) Correlation Coefficients, these statistics can vary from –1.00 (a perfect negative correlation or association) through .00 (no correlation) to +1.00 (a perfect positive correlation). Note that +1 and –1 are equally high or strong, but they lead to different interpretations. A high positive correlation between two

items on a questionnaire would mean that persons with high ratings on the first item tended to have higher ratings on the second item, those with lower item ratings on item one had lower item two ratings, and those in between had ratings that were neither especially high nor especially low. On the other hand, a high negative correlation would mean that students rated high on one item tended to rate low on the other item. With a zero correlation there are no consistent associations. A person rated high on one item might be rated low, medium, or high on the other item.