ABSTRACT

Over the past decade there has been increasing concern about the fairness of test items to various ide_B.tifiable subgroups in the test-taking population. One manifestation of this concern is the increase in research on differential item functioning (DIF). Although there had been a substantial number of procedures developed to uncover DIF (see Shepard, Camilli, & Williams, 1985, for a survey of some of these), most were very much engineering approaches that were seat- of-the-pants approximations of as-yet-undeveloped statistically rigorous procedures (i.e., Angoff & Ford’s 1973 delta method; Camilli’s chi-square method [given in Shepard, Camilli, & Averill, 1981]). Because of this lack of rigor some were just plain wrong (Scheuneman, 1979). Recently, we have seen the emergence of two kinds of statistically rigorous procedures for ide_B.tifying DIF. A sampling from each of these is: