ABSTRACT

A substantial body of research in both speaking and writing assessment has observed diversity in the features of performance attended to by raters. The participants were divided into three groups: more-proficient raters who demonstrated relatively good scoring performance during the study, less-proficient raters whose performance was relatively poor, and intermediate raters whose performance was of intermediate quality. More- and less-proficient raters differed little in the frequency with which they mentioned different scoring criteria, and a diligent effort by a less-proficient rater to rigorously apply the scoring rubric did not lead to greatly improved scoring performance in terms of severity or agreement with reference scores. The frequency of unfocused comments decreased after training for seven of nine raters and in general the frequency was lowest at the final scoring session. While the rater training used in this study positively influenced scoring performance, the impact on raters’ internalized scoring criteria was less clear.