ABSTRACT

With the press for developing innovative assessments that can accommodate higher-order thinking and performances associated with the Common Core Standards, there is a need to systematically evaluate the benefits and features of automated essay evaluation (AEE). While the developers of AEE engines have published an impressive body of literature to suggest that the measurement technology can produce reliable and valid essay scores (when compared with trained human raters; Attali & Burstein, 2006; Shermis, Burstein, Higgins, & Zechner, 2010), comparisons across the multiple platforms have been informal, involved less-than-ideal sample essays, and were often associated with an incomplete criterion set.