ABSTRACT

Document clustering is useful in organizing a large number of documents into a small number of meaningful clusters, in extracting salient features of related documents and in searching for similar documents. The most critical problems for document clustering are the high dimensionality of the natural language text and the choice of features used to represent a domain. In this paper, we present a document clustering methodology trying to tackle these problems. The proposed methodology is based on Association Rule Mining. We also present empirical tests, which demonstrate the performance of the proposed methodology using datasets consisting of paper abstracts from biology, economy, computer science and civil engineering.