ABSTRACT

Behavioral scaling is proposed as a general term to cover various procedures for making test results directly interpretable in terms of what examinees know or can do. It is more inclusive than what has come to be known as criterion-referencing, which applies when tests are deliberately designed to provide behavioral information with reference to specific objectives of school learning. Test theory has an important role in behavioral scaling, but behavioral scaling requires use of a person characteristic function (PCF) rather than the item characteristic function. Problems that have arisen in efforts to scale tests behaviorally are discussed. Inasmuch as behavioral scaling is of particular importance and relevance in the case of cognitive ability tests, illustrations are given of behavioral scaling as applied to three subtests of the Woodcock–Johnson Psycho-Educational Battery—Revised, measuring Gc, Gf, and Gv, respectively. Data came from the norming versions of the tests given to 1, 800 subjects ranging in age from 4 to 85. The behavioral scaling of these tests permits meaningful interpretation of differences over age groups.