ABSTRACT

Standard methodologies in typical testing applications involve calculating a score for each assessed individual with respect to some domain of interest. In situations where the assessment booklets administered to each testtaker contain large numbers of items and each test form is constructed to be psychometrically parallel, standard analysis approaches work reasonably well. Similar to most educational surveys (e.g., Programme for International Student Assessment [PISA], Trends in International Mathematics and Science Study [TIMSS], and Progress in International Reading Literacy Study [PIRLS]), the National Assessment of Educational Progress (NAEP), a U.S. large-scale educational survey assessment conducted in Grades 4, 8, and 12 in subjects such as reading, mathematics, and science, has a key design characteristic that requires a nonstandard methodology. Each student answers only a systematic, small portion of the cognitive item pool, ensuring that all students combined provide an approximately equal number of responses to each item in the item pool and, under the most common design, that any

CONTENTS

Introduction ......................................................................................................... 203 Method .................................................................................................................. 210

Manipulations, Data, and Evaluation ......................................................... 211 Results ................................................................................................................... 216 Discussion and Conclusion ............................................................................... 224 Acknowledgments .............................................................................................. 226 References ............................................................................................................. 226

two items appear together in at least one form. As a result, students cannot be compared to each other directly and reliable individual proficiency estimates cannot be obtained. However, consistent estimates of proficiency (e.g., averages) and the dispersion of proficiency (e.g., variances) in various reporting groups of interest (Mazzeo et al. 2006) can be estimated by employing a latent regression model (Mislevy 1984, 1985).