ABSTRACT

In this chapter we cover key considerations when planning and executing a large-scale constructed scoring event that uses human raters. Based on extensive experience with conducting human scoring, we discuss essential needs for a ratings collection system, important decision points for designing the scoring process, and some advice on training and monitoring raters. The information provided is intended to assist practitioners in deploying human constructed response scoring to support the use of automated scoring by ensuring reliable and valid human ratings.