Validity and Reliability of Automated Essay Scoring

doi:10.4324/9780203122761-11

ABSTRACT

The purpose of this chapter is to develop a plan to support the use of automated essay scoring (AES) systems. Such a plan involves a clarification of the purpose of AES, how it could achieve this purpose, and what evidence exists that it is successful. This plan elaborates and expands upon previous proposals that were concerned with the validity of automated scoring in general (Bennett, 2006; Bennett & Bejar, 1998; Clauser, Kane, & Swanson, 2002; Yang, Buckendahl, Juszkiewicz, & Bhola, 2002) or specifically with automated essay scoring (Keith, 2003). That such a plan is needed is evidenced by the skepticism and criticism that have accompanied AES over the years. For example, in a New York Times column, Scott (1999) remarks cynically that “… it has come to this. The essay, the great literary art form that Montaigne conceived and Virginia Woolf carried on …has sunk to a state where someone thinks it is a bright idea to ask a computer if an essay is any good.” Criticism of AES has been especially harsh within the community of writing professionals (Ericsson & Haswell, 2006). A major organization for writing professionals, the Conference on College Composition and Communication (CCCC), summarizes its position on AES (Communication, 2004), with the words: “We oppose the use of machine-scored writing in the assessment of writing.” That this statement has not been revised in the last eight years, in spite of the widespread adoption of AES, suggests that AES developers have not been successful in explaining the intrinsic qualities of AES beyond the obvious logistical benefits of speed and cost.