ABSTRACT

Researchers who build norm-referenced tests (NRTs) approach the task differently from those who build criterion-referenced tests (CRTs). NRTs are intentionally built to be of medium difficulty. Specifically, items that are answered correctly by about 50% of the examinees in tryouts during test development are favored in the selection of items for the final versions of the tests. It is essential that an NRT be of medium difficulty because this facilitates the comparison of an individual with a group. In contrast, when building CRTs, item difficulty typically is of little concern. Instead, expert judgment is used to determine the desired level of performance and how to test for it. The need to have items of medium difficulty in NRTs is the basis of a major criticism of this type of test. Specifically, the criticism is that building NRTs is primarily driven by statistical considerations i.e., item difficulty and not by content considerations.