ABSTRACT

To improve the scalability of a topic model for Big Data analysis, much eort has been put into large-scale topic modeling. Parallel LDA (PLDA) was designed by distributing

Gibbs sampling for LDA on multiple machines (Wang et al. 2009). Another exible largescale topic modeling package named Mr.LDA is implemented in MapReduce, where model parameters are estimated by variational inference (Zhai et al. 2012). A novel architecture for a parallel topic model is demonstrated to yield better performance (Smola and Narayanamurthy 2010).