ABSTRACT

This chapter summarizes the discussion of the foundational areas from the Standards for Educational and Psychological Testing by illustrating procedures for evaluating the quality of rater-mediated assessments using a case study. It presents illustrative analysis of data from a middle grades writing assessment based on Rasch Measurement Theory, and discusses the results in terms of validity, reliability, and fairness issues. The illustrative data were collected during an administration of the Georgia High School Writing Assessment. According to the Georgia Department of Education, this assessment was designed to provide diagnostic information regarding students' strengths and weaknesses in persuasive and expository writing. The chapter examines "raw" differences between rater calibrations on the logit scale for the two gender subgroups using a visual display called a DIF map as an additional method to gauge the direction and magnitude of differences in rater severity for female and male students.