ABSTRACT

Researchers have attempted to automate the grading of student essays since the 1960s (Page, 1994). The approach has been to define a large number of objectively measurable features in the essays, such as essay length, average word length, and so forth, and use multiple linear regression to try to predict the scores that human graders would give these essays. Even in this early work, results were surprisingly good. The scores assigned by computer correlated at around .50 with the English teachers who provided the manually assigned grades. This was about as well as the English teachers correlated with each other More recent systems consider more complex features of essays, for example, work at ETS (Educational Testing Service) has attempted to simulate criteria similar to what a human judge would use, emphasizing sophisticated techniques from computational linguistics, to extract syntactic, rhetorical, and content features (Burstein, et al 1998). The Intelligent Essay Grader (IEA) attempts to represent the semantic content of essays by using features that group associated words together via singular value decomposition (SVD) (Landauer, 2000).