Probable Cause : Developing Warrants for Automated Scoring of Essays

doi:10.4324/9780203122761-10

ABSTRACT

When the idea of scoring an essay with a computer was first introduced (Page, 1968) it was a radical concept, but now such an application of technology almost seems mundane when contrasted with computer systems that can translate text across dozens of languages (https://translate.google.com/), best the top human competitors in the game show Jeopardy (Ferrucci et al., 2010), and perform personal assistant services from verbal commands (https://www.apple.com/iphone/features/siri.html). If computers can perform these kinds of tasks, then why shouldn’t they be able to score a basic academic essay typical of those used in standardized testing programs? The potential benefits of using computers to score essays include improving the quality of scores by eliminating idiosyncratic behaviors of human raters (e.g., halo effects, fatigue, central tendency), reducing the time for score reporting, allowing for immediate performance feedback, and reducing the cost and coordination effort of recruiting and managing human raters. An automated essay scoring ² (AES) system that provides performance feedback could also encourage broader use of essays in learning and assessment, which would facilitate greater alignment between the kinds of tasks that appear in assessments and the kinds of performances most valued in education. Of course, there are potential drawbacks to automated scoring as well, including the cost of development and validation and uncertainty regarding the extent to which these systems produce scores in the same way as human raters.