ABSTRACT

Dictionaries contain valuable information that can be used in sublanguage analysis. Some words are, of course, associated uniquely with a particular subject domain; their occurrence in a text identifies it as belonging to that sublanguage. Other words have several senses, each of which may be specialized for a different domain. In this case, it is necessary to disambiguate the appropriate sense for the given context. To determine the subject domains for a set of texts, we have developed a procedure that satisfies both of these cases. It takes advantage of the semantic codes contained on the computer tape (but not in the printed version) of the Longman Dictionary of Contemporary English. For a given text, each word is checked against the dictionary to determine the semantic codes associated with it. By accumulating the frequencies for these senses and then ordering the list of categories in terms of frequency, the subject matter of the text can be identified. Our initial work has been done using the New York Times News Service wire. We are currently investigating strategies for extending this procedure to produce a more elaborate profile for the text. This work is being carried out within the framework of a more general interest in natural-language and knowledge-resource systems.