ABSTRACT
This section introduces BERTopic, a deep learning-based topic modelling approach based on the BERT architecture. To enable you to apply BERTopic, we begin with explaining the algorithms integrated into BERTopic: UMAP for dimensionality reduction, HDBSCAN for topic extraction, and c-tf-idf for identifying the most relevant keyword phrases per topic. You will also learn how to refine your model using deep neural networks, including large language models (LLMs) such as ChatGPT. We then guide you through topic extraction from your text corpus, establishing a topic hierarchy and reconstructing topic evolution over time. Additionally, we cover topic visualization, and coherence scores calculation. We conclude with an overview of BERTopic’s advantages and limitations.
