ABSTRACT
Item response theory (IRT) is a collection of statistical models and methods used throughout the social sciences to construct tests and generate scores for individuals. IRT is widely used in high-stakes tests (e.g., GMAT, GRE, LSAT), state summative assessments, credentialing exams (e.g., NCLEX-RN), large-scale educational surveys (e.g., NAEP, PISA, TIMSS), and other aptitude tests (e.g., ASVAB). Though IRT is most closely associated with educational measurement, it has been used in many disciplines due to the generality of the underlying statistical framework. More specifically, in IRT, the individual trait is unobserved, or latent, and is assumed to underlie the observed item responses. In educational testing, the latent trait is typically interpreted as ability. However, the idea of a latent trait is quite general, and consequently IRT has been used in many other disciplines. For example, IRT has been used to study depression in clinical psychology, ideology in political science, and quality of life in healthcare.
An IRT model defines the probability of an item response as a mathematical function of the latent trait as well as item parameters, such as item difficulty. Typically, IRT analyses proceed in two stages. In a first stage, the observed item responses from a sample of individuals are used to estimate the item parameters. In a second stage, these item parameter estimates, along with an individual’s item responses, are used to estimate the individual’s latent trait. This latent trait estimate is the individual’s IRT score.
The separation of item and person parameters is a distinguishing feature of the IRT framework, and it provides a great degree of flexibility in test construction and other applications. For instance, item parameter estimates from different tests can be transformed to be on the same scale. Then, subsets of items can be selected to create different test forms, and the resulting test scores will still be directly comparable. In general, this is the process underlying computerized adaptive testing, where each test is tailored to the examinee’s ability.