ABSTRACT

Information overload due to unstructured text is the greatest problem in natural language processing (NLP). A number of studies seeking to solve classification problems have been conducted on word-embedding techniques like word2vec and GloVe, but these models have the problem of out-of-vocabulary words. This study examines the word-embedding techniques word2vec, GloVe, fastText, and BERT for sentiment classification using the long short-term memory (LSTM) model. The accuracy of these models is calculated using the IMDB review dataset, which contains 50,000 text reviews. The word-embedding model is compared with traditional text representations, BOW and TF-IDF. The results indicate that word embedding with the BERT approach generates the greatest accuracy.