ABSTRACT

Topic models can be described as the idea that documents naturally contain a mix of topics where a topic can be described as a distribution over the terms in a corpus of documents [1]. They are desirable where there is a massive amount of textual data stored in large collections and where short descriptions of those documents could be created and used in a variety of ways; to identify similarities across the corpus, to create visualizations of where documents fit relative to others, and also in searching and indexing methods [2]. The main goal is to find short descriptions of the topics in a corpus while still maintaining the statistical relationships in the corpus [3]. In this way, popular tasks like classification, similarity, and summation can be applied.