ABSTRACT

In this paper we present a comparative analysis of various word embedding models that were used to train several long short-term memory networks which perform automated essay scoring. Our project aims to build a system that can evaluate the essays in a very efficient way based on various aspects such as vocabulary, tense, voice, grammatical and spelling errors, and sentence lengths. The dataset was taken from a past competition held on Kaggle named Automated Student Assessment Prize (ASAP). First, we performed feature selection and extracted words (tokens) from the essays and after that, we tokenised the essays into sentences then further into words and made feature vectors (Word Embeddings) from them using different word embedding models. Nowadays we have access to various word embedding models having competitive performance and results. So, in our study we have compared various word embedding models namely Word2Vec, GloVe and FastText that are available for text processing. We saw how well certain word embeddings perform with different types of long short-term memory networks and saw combinations which can lead to best results. Hold-out cross-validation was used as a validation technique because of the large size of our dataset and Long Short-Term Memory (LSTM) network was used to train and test the model. Quadratic mean average kappa score was used as a performance measure to find errors between the actual scores and predicted scores of the essays. Our model has given the kappa score of 0.97207 which was result of several iterations of training and testing different neural networks with different word embedding models.