ABSTRACT

Psychological measurement is the oldest area of scientific research in psychology and probably the area with the most sophisticated controversies. The chapters in this volume reflect a good deal of disagreement on the fundamental question: How should we measure subjective values? This chapter reviews some of these controversies and presents new points of view on some old, but unsettled, problems.

There are two popular methods for obtaining “direct measures” of psychological value—the methods of category rating and magnitude estimation. If category ratings and magnitude estimations are linearly related to subjective value, they should be linearly related to each other. Instead, magnitude estimations are often a positively accelerated function of category ratings. This apparent contradiction has long troubled psychologists, and several theories have been proposed to explain the discrepancy. Section A notes that the relationship between ratings and magnitude estimations varies because both depend on stimulus spacing and the range of responses implied in the instructions. Therefore, theories that assume an invariant relationship between the results of the two procedures face grave difficulties.

Because the instructions for magnitude estimation (M) seem to focus on “ratios” whereas the instructions for ratings (C) seem to focus on “intervals,” it seems reasonable to speculate that the relationship between C and M could be better understood in terms of the comparison processes of the judge. Section B reviews experiments designed to test the hypothesis that judges use the same comparison operation despite instructions to judge “differences” or “ratios.” (In this chapter, quotation marks are used to denote the instructions given to the subject or the responses obtained with such instructions. It is possible to empirically test the hypothesis that “ratio” judgments, for example, fit a ratio model, so it is important to maintain the distinction between the task given the subject and the model used to represent the data.)

Many experiments, reviewed in Section B, are consistent with the hypothesis that subjects use the same comparison process to judge both “differences” and “ratios.” If subjects use only one operation, is there any way to decide empirically how to represent that operation? Section C discusses more general theories of stimulus comparison that make predictions for tasks in which judges are asked to compare two stimulus relations, for example, to judge the “ratio of two differences” or the “difference between two ratios.” In this wider realm, it is possible to test among theories that would otherwise be impossible to discriminate. Evidence from three studies suggests that the “basic” operation for comparing two stimuli is subtraction.

Contextual effects in scaling are discussed in Sections A, D, and E. In Section A, contextual effects due to stimulus spacing in category ratings and magnitude estimations are shown to be comparable in form for the two procedures. However, Section D shows that in stimulus-comparison experiments, it may be possible to derive scales that are largely independent of stimulus distribution. In certain situations it is possible to localize the effects of stimulus distribution in the final stage of processing (i.e., in the response function). Section D also presents evidence that in cross-modality comparisons a stimulus is compared in relation to other stimuli within its own modality, and contextually determined values within modality are compared between modalities. Hence, scale values derived from cross-modality comparison depend on the stimulus contexts.

Section E discusses philosophical implications of contextual effects for methodology. Some have argued that there is a “right” way to do psychophysical experiments and have advocated experimental designs that would preclude evaluation of the theories upon which the methodology is based. An alternative point of view is presented in which contextual effects are regarded as basic to studies of scaling, and they are therefore accepted and even welcomed.

Section F takes up controversies in measurement and model testing. The parallelism test of functional measurement is shown incapable of simultaneously establishing the validity of the response scale and model. Two areas of research, impression formation and the size-weight illusion, are reviewed to challenge previous conclusions of functional measurement and to show how methodological loopholes in simplistic application led to inappropriate conclusions. Improved techniques for model testing are discussed.

Section G evaluates related theories of psychophysics that attempt to encompass a wide array of data. It is shown that theories requiring different scales of sensation for different tasks are not yet needed by the data and simpler theories that assume a single scale of sensation remain consistent with a variety of data.