ABSTRACT

Until recently, research in computational linguistics has mostly focused on syntactic parsing. As a result of this effort, the syntactic capability of natural language processing (NLP) systems has reached a level of relative maturity and stability, enabling researchers to turn to other linguistic areas, such as semantics and the lexicon, which have so far been neglected. Some systems that are dedicated to syntactic parsing tend to operate with small lexicons, usually manually coded and restricted to a few hundred entries. Others are restricted to narrow semantic domains, where vocabulary is limited and lexical items mostly unambiguous. The few systems that are based on large lexicons restrict the content of their lexicon to syntactic information with minimal semantic information. It has recently become clear, however, that if machines are to “understand” natural language, they must have recourse to extensive lexical databases, in which a wealth of information about the meaning of words and their semantic relations is stored. To create such databases manually is not feasible. The task is too time consuming and labor intensive. Instead, current research in lexical semantics concentrates on the extraction of semantic information from sources available on-line, such as dictionaries and thesauri in machine-readable form (see Walker, Zampolli, & Calzolari, 1987). This information is then being used to create lexical databases for NLP systems.