ABSTRACT

So to help preventing this issue, we propose a new approach that combines fuzzy decision tree and document similarity for document clustering. The key idea is to search for the similarity and the dissimilarity between documents to facilitate classification. Our approach consists of three stages. In the first stage, we collect a set of documents. In the second stage, documents are cleaned by decomposing them into words, unnecessary words are eliminated and represented in a formal representation using vectors. We also use the Cosine similarity distance to calculate the similarity between document vectors. Finally, we perform documents clustering, using a proposed fuzzy clustering algorithm.