ABSTRACT

The Mantel–Haenszel differential item functioning (MH D-DIF) statistics play an important role in the assessment of the appropriateness of test forms intended for administration to examinees from populations that contain ide_B.tifiable disjoint subpopulations, such as ethnic groups, men and women, or sociodemo-graphically or geographically defined subpopulations. A typical DIF (differential item functioning) analysis deals with a test form administered to a set of examinees from two disjoint population groups, referred to as the focal and reference groups. The examinees rarely are obtained as a sample drawn from a population using a sampling design, but for the purposes of DIF analysis it is assumed that they provide an adequate representation of the population. Suppose for this population there is a unidimensional ability scale underlying the performance on the test form. Then in the cross-classification of the two population subgroups by the scores on this scale we have pairs of matched subgroups, that is, members of the two population subgroups with identical abilities. An item in the test form is said to have no DIF if the proportions of correct responses to the item are identical for each pair of matched subgroups. Of principal interest is detection of the items with substantial DIF for at least one pair of population subgroups.