ABSTRACT

In the last 10 years, the family of item response theory (IRT) models and their applications has expanded tremendously (see, for instance, de Boeck & Wilson, 2004; Skrondal & Rabe-Hesketh, 2004; van der Linden & Hambleton, 1997). However useful these models are, the applications are only valid if the fit between the model and the data is reasonable. Note that the qualification reasonable is quite vague. The background of this vagueness is twofold. First, in most instances no model will fit the data perfectly. A statistical model describes a very simple stochastic mechanism that will almost surely be a gross simplification of reality. Second, the power of statistics for the evaluation of model fit (that is, the probability of rejection when the model is violated) grows very fast as a function of the sample size. With larger samples, a model will be rejected even if the model violation is very small and without practical consequences in the foreseen application. Therefore, evaluation of model fit always has an element of subjective judgment in it. On the other hand, the basis on which these judgments are made must be acceptably justified.