ABSTRACT

The scoring scale used for a mental measure (e.g., educational and psychological tests) dictates the type and extent of interpretations that are possible. When a particular test is grounded in classical test theory (CTT), for example, the scale should reflect the notion of an extant true score; and, reliability will give important clues to the authenticity of that true score. Far and away the most popular strategy for scale development in cognitive measures with sufficient audience is to use item response theory (IRT). IRT is both a theory of measurement and a concomitant set of powerful statistics useful for its implementation. The statistical model for IRT is a probability function (also called item characteristic function). It describes the probability of an examinee endorsing a given response to an item with particular characteristics. The evaluation is repeated for many examinees, each with a different, albeit unknown, ability, but who aggregate to form a normal distribution of abilities.