ABSTRACT

The validity of the human scores on which the automated score is based must also be established. Wolfe shows us that much more is going on in human scoring than meets the eye and how to investigate the many issues of evidentiary reasoning involved, how to articulate alternative explanations that arise, and how to identify sources of backing that can be marshaled. ‘Scoring’ is a term inherited from familiar processes of comparing multiple-choice responses with keys and eliciting evaluative ratings from human raters. The warrants posit that the scores characterize targeted qualities of a performance and that the raters are providing those scores on the basis of those qualities and with sufficient accuracy. The correspondence between the assessment design/interpretation elements and corresponding elements of the score-use situations can render scores more or less informative for various subsequent uses, quite beyond the construct labels assigned to the scores of performances and the assessment as a whole.