ABSTRACT

One way of structuring a corpus is to sort the documents into categories based on their topical content. Current text systems accomplish this task with varying degrees of sophistication. For example, RUBRIC [McCune et a l, 1985] allows the user to define an elaborate conceptual hierarchy, bot­ toming out on keywords, tha t classifies documents according to what topics they contain. (Hayes, this volume, describes related systems.) Assuming, then, that the main topic of a document can be determined, how can the document be further distinguished from others describing the same topic?