chapter  11
26 Pages

Monitoring Raters in Performance Assessments: George Engelhard, Jr.

As the number and variety of performance assessments increase in educational settings, it

is essential to monitor and evaluate the quality of ratings obtained. Any assessment

system that goes beyond selected-response (multiple-choice) items and incorporates

constructed-response items that require scoring by raters must have procedures for

monitoring rating quality. The general structure of rater-mediated (RM) assessments

includes raters judging the quality of an examinee’s response to a task designed to

represent the construct being measured using a rating scale. The key feature of RM

assessments is that the examinee’s responses (e.g., essays and portfolios) become the

stimuli that raters must interpret and evaluate to produce ratings. Although it may seem

like a trivial issue, it is very important that the measurement models used to evaluate RM

assessments are indeed models of rater behavior, performance, and response. RM

assessments do not provide direct information regarding examinee achievement because

the examinee’s responses must be mediated and interpreted through raters to obtain

judgments about examinee achievement. One of the major concerns regarding RM

assessments is that raters bring a variety of potential response biases that may unfairly

affect their judgments regarding the quality of examinee responses. There may be a

variety of construct-irrelevant components that may appear in RM assessment systems

related to response biases on the part of raters that impact the overall validation of

particular uses and interpretations of test score re-suits within the context of state

assessment and accountability systems (Linn, chap. 2, this volume).