ABSTRACT

The intelligence testing movement of the first four decades of the 20th century and its attendant controversies lead to the development of classical test theory (CTT). Many of the familiar constructs, such as true score, reliability, and validity, arose from Spearman's work in providing a mathematical underpinning for his theory of intelligence. Since the dominant statistical theory of the time was Pearsonian statistics, CTT rests heavily on correlational concepts. Gulliksen's (1950) book contained a comprehensive presentation of the theory within a Pearsonian framework. Subsequently, Lord and Novick (1968) reformulated the basic constructs of the theory using a modern mathematical statistical approach. The basic element in this theory was the test score. Items and their characteristics played a minor role in the structure of the theory. Over the years, both the psychometric theoretician and the practitioner became dissatisfied with the discontinuity between roles of items and test scores in the theory. It seemed intuitively reasonable that a test theory should start with the characteristics of the items composing a test rather than with the resultant score. The origins of such an item-based test theory can be seen in the work of Binet and Simon (1916). They used a tabular presentation of the functional relation between the proportion of correct response to an item and chronological age to place items within their intelligence test. Terman (1916) and Terman and Merrill (1937) used this same kind of inforn1ation to plot curves relating the two variables. In modern parlance, they were using item characteristic curves. For many years, the item characteristic curve approach was considered simply as an alternative item analysis technique. The work of Lawley (1943) marks the beginning of a test theory based upon the items of a test. In a remarkable paper, Lawley showed how to obtain maximum likelihood estimates of the parameters of the item characteristic curve, defined the true score in terms of the items of a test, and showed that the classical reliability coefficient can also be expressed as a function of these item parameters. Thus, what had been understood intuitively in the past had come to pass. A major extension of Lawley's work was due to Lord (1952), who showed that a wide range of additional classical test theory constructs could be expressed as functions of the parameters of the item characteristic curves of the test items. The work of these two men

established the basic concepts of the psychometric theory based on items now known as item response theory (IRT).