ABSTRACT

Language is the prototypical rule-governed domain of human intelligence, with linguistic representations involving tree-structured forms with rich featural specifications and elaborate relations among non-neighboring nodes. Traditional artificial intelligence approaches to natural language processing (NLP) have therefore emphasized the complex symbolic manipulations involved in understanding human language. In addition to the necessity of creating complex representations, these systems also face the challenge of determining an effective approach for resolving linguistic ambiguities in order to focus on a single interpretation of the input. Ambiguity is pervasive at all levels of linguistic representation, and yet people usually have no difficulty in determining the intended interpretation of a sentence. Much recent work in psycholinguistics has suggested that numeric information, such as lexical frequencies and co-occurrence probabilities, play a central role in linguistic ambiguity resolution (e.g., MacDonald, Pearlmutter, and Seidenberg 1994; Spivey-Knowlton, Trueswell, and Tanenhaus 1993; Juliano and Tanenhaus 1994). At the same time, computational linguists have begun to develop automatic methods for resolving ambiguity that are based on statistical models of word co-occurrences derived from large text corpora (e.g., Hindle and Rooth 1993; Schutze 1993; Weischedel et al. 1993). However, pure statistical models are not able to capture sophisticated grammatical knowledge, nor do they appear sufficient to model human behavior. 1 Effective modeling of both linguistic knowledge and performance requires a computational framework that successfully integrates higher level symbolic processing abilities with the numeric information that crucially focuses the understanding process onto a coherent interpretation.