ABSTRACT

It is well known that most measurement devices are not perfect. Physical scientists have recognized this fact for many years and have learned to repeat their measurements many times to obtain results in which they can be confident. Repeated measures can provide the average of a set of recurring results, which is expected to provide a more precise estimate of what is being appraised than just a single measurement. Unfortunately, within the educational arena commonly obtained measurements cannot be repeated as straightforwardly as in the physical sciences. Because the results of measurements in education can have a profound influence on an individual’s life, the derivation and accuracy of the scores have been the subject of extensive research in the so-called psychometric literature. There are currently two major psychometric theories for the study of measurement procedures: random sampling theory and latent trait theory, which is also known as Item Response Theory (Suen 1990). Within random sampling theory there are two approaches, the Classical Test Theory approach and the Generalizability Theory approach (Cronbach et al. 1963; Cronbach et al. 1972; Gleser et al. 1965), whereas within IRT there are more than two dozen approaches (Bond and Fox 2001, Chapter 9; Embretson and Reise 2000).