ABSTRACT

It has been 26 years since publication of the first Rasch-based test in the United States—the KeyMath Diagnostic Arithmetic Test (Connolly, Nachtman, & Pritchett, 1971). Many tests published since have utilized the item calibration and test linking features that emanate from the Rasch model. Only a few tests have capitalized on the powerful interpretation features that become accessible when person abilities and item difficulties have been calibrated on a common Rasch scale. The basis for these interpretation features is summarized by Embretson (1996):

In CTT [Classical Test Theory], score meaning is determined by a norm-referenced standard. … An objection that is often raised to norm-referenced meaning is that scores have no meaning for what the person actually can do.

… in IRT [Item Response Theory] models, the meaning of a score can be referenced directly to the items … The probability that a person passes a particular item is derived from the match of item difficulty to trait level.

If these items are further structured by content, substantive trait level meaning can be derived.

In some tests … IRT trait levels are also linked to norms. … Thus, IRT trait levels also can have norm-referenced meaning, (pp. 345–346)