ABSTRACT

Chapter 16 deals with the development of rating scales (multi-item measures amenable to reliability analysis) and questionnaires, which usually use one or two closed-ended or open-ended items to assess constructs. Thurstone’s method of equal-appearing intervals is outlined to provide a historical perspective on the scale development process. Likert’s method of summated ratings shortened the procedures needed to develop a rating scale and is widely used today. The chapter points out that internal consistency reliability, a critical indicator of measure quality, cannot be assessed on Thurstone scales, but it can be used with Likert scaling, and with Osgood’s semantic differential approach, which is also reviewed in the chapter, as well as Guttman’s scalogram measurement model. Methods of enhancing scale reliability and the use of exploratory and confirmatory factor analysis are discussed. Details of the scale development process, including number and extremity of response options, item reversals, and use of unipolar or bipolar scales, occupy the final section of the chapter.

Studying people’s beliefs, attitudes, values, and personalities is a central research preoccupation of the social sciences. Typically, we use questionnaires or scales to measure these internal states or dispositions. Such measures rely on self-report, by asking respondents to provide answers to a set of questions or scale items that inquire about their personal thoughts, feelings, or behaviors. Using these measures requires that we consider variations in responses among respondents as meaningful, and not attributable to mere measurement error. In theory, all participants are expected to interpret the questions or items used in a scale or questionnaire identically. That is, a given questionnaire or scale item is assumed to mean the same thing to all participants, and differences in their responses to these hypothetically identical items are assumed to reflect real differences in their underlying dispositions.