chapter  12
30 Pages

Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements

WithJordan Boyd-Graber, David Mimno, David Newman

Topic models are a versatile tool for understanding corpora, but they are not perfect. In this chapter, we describe the problems users often encounter when using topic models for the first time. We begin with the preprocessing choices users must make when creating a corpus for topic modeling for the first time, followed by options users have for running topic models. After a user has a topic model learned from data, we describe how users know whether they have a good topic model or not and give a summary of the common problems users have, and how those problems can be addressed and solved by recent advances in both models and tools.