ABSTRACT

An observed score obtained with a measurement instrument is just one of many possible scores that could have been obtained, because an alternative context, other measurement conditions, and varying circumstances may lead to other observed scores. In other words, an observed score is usually obtained for a particular test form. Another equivalent test form, however, may have been as appropriate for our measurement purpose but might have led to a different observed test score. Consequently, if one wants to model observed scores, one has to take into account many sources of variation (including error variation). This also applies if one considers the reliability of scores obtained from a measurement instrument. Classical reliability provides a decomposition of the observed score into a true score and only one type of error. Theoretically, this error is undifferentiated. Several reliability estimation procedures lead to specific conceptualizations of error: parallel test forms reliability e.g., considers the lack of equivalence between the forms as the source of error, test-retest reliability the time of testing, and internal consistency reliability the variability in test items.