ABSTRACT

An automated grammatical error detection system called ALEK (Assessment of Lexical Knowledge) is being developed as part of a suite of tools to provide diagnostic feedback to students. ALEK’s goal is to identify students’ grammatical errors in essays so that they can correct them. Its approach is corpus-based and statistical. ALEK learns the distributional properties of English from a very large corpus of edited text, and then searches student essays for sequences of words that occur much less often than expected based on the frequencies found in its training. ALEK is designed to be sensitive to two classes of errors. The first error class consists of violations of general rules of English syntax. An example would be agreement errors such as determiner-noun agreement violations (“this conclusions”) or verb formation errors (“people would said”). In this chapter, we address how ALEK recognizes violations of this type. The second error class is comprised of word-specific usage errors, for example, whether a noun is a mass noun (“pollutions”) or what preposition a word selects (“knowledge at math” as opposed to “knowledge of math”). ALEK’s detection of this class of errors is discussed in Chodorow and Leacock (2000) and in Leacock and Chodorow (2001).