Assessing equivalence of different language versions of a test | 4

ABSTRACT

Key concepts in assessing whether different language versions of a test function equally or not are differential functioning, measurement equivalence and measurement invariance. If two test versions are to be considered equal it means that all their elements need to function equally on the studied population – there should be no differential functioning and results should show full measurement invariance and equivalence. Differential functioning can have multiple forms, which are discussed in this chapter, and researchers have proposed multiple levels of measurement equivalence, ranging from construct inequivalence to full scalar equivalence. Data needed for assessing equivalence can be obtained through one of the three general types of designs involving either monolingual or bilingual test-takers, but before starting the main data collection, it is good to have the equality of test versions assessed once more by experts for cultures involved and to also conduct a pilot study in order to make one final verification that no big mistakes slipped through the adaptation process. Multi-group confirmatory factor analysis can be used to establish the level of measurement invariance between responses on different test versions, and this is done through progressively restricting model parameters to be equal on data coming from two test versions. Statistical procedures and inferences that can be made about test equality depend on the data available and the method of data collection. When it is necessary to compare scores on different tests, tests can be equated using one of the methods of test equation. Test can be linked to a different degree, and based on the strength of this link, these linkage methods can be equating, calibrating, statistical moderation and prediction.