ABSTRACT

For a long time, educational testing was dominated by the ideal of a standardized testing developed in psychology in the early 1900s, with tests that typically consisted of single forms used for the same population of subjects for an extended period of time. The same model worked well in the early days of educational testing in the U.S., when large-scale, group-based testing was introduced primarily for college admission. The only difference was the much shorter cycle of test development of 1 year rather than the average of some 20 years for the Stanford-Binet Intelligence Scale. However, the model appeared unmanageable when learning and testing became more integrated, especially during the movement toward more individualized instruction in the 1960s. Monitoring of the achievements of individual students requires frequent testing, but because of memory effects it was impossible to use the same test form more than once for the same students. More fundamentally, as learning implies change in ability, the assumption of a single stable population of students became meaningless. Besides, at roughly the same time psychometricians became more acutely aware of the fact that test forms that are best for some population may be much less than optimal for a majority of its individual students.