ABSTRACT

Given the increasing number of books that libraries have, it becomes increasingly difficult for students to find books. We note that the current methodologies like the Dewey Decimal system, are becoming inefficient with book retrieval as the number of books increases. In this paper, we attempt to provide a content-based classifier to organize books to significantly improve retrieval over the current retrieval method. Support Vector Machine (SVM) was the best performing model achieving an accuracy of 79.8%, while latent Dirichlet allocation achieved an accuracy of 28.1%. We also note that the SVM model predicts each news headline in constant time. On average it takes 0.0029s to predict the category of a news headline.