ABSTRACT

As an illustration of the topic of the present chapter, consider a questionnaire on depression in which respondents are asked to indicate whether they experienced certain emotions or performed certain behaviors during the last week. For example, it is asked whether the person felt lonely, experienced lack of energy, suered somber thoughts, had trouble in falling asleep, and failed to work eciently. Suppose further that it is also asked whether the respondent cried or felt like crying. A depression score is obtained by the sum of all the items endorsed. If one would nd that the average sum score is higher for women than for men, the interpretation would be that in the studied sample, women are more depressed than men. However, generally speaking, crying has a lower threshold for women than for men (e.g., Schaeer, 1988). is implies that if you have a man and a woman of a comparable level of depression, the probability of crying is higher for the woman than for the man. Consequently, if crying items are included in a depression questionnaire, the average sum score for women may turn out to be higher than for men. However, this nding may be due to fact that crying items function dierently for both genders and not to the fact that women are more depressed than men. Hence, when comparing test scores between groups, it seems necessary to investigate whether

or not the test items function in the same way across the groups. is screening of items is called the study of dierential item functioning or shortly DIF.