ABSTRACT

In recent years, technological developments support advanced treatments for dangerous diseases like cancer and assist in life saving. The cancer tumor has thousands of genetic mutations. Personalized medicine involves the systematic study of genetic mutation and other related information. Understanding the cancer tumor growth is a challenging task even when advanced genetic analysis is adopted. At present, understanding genetic mutation is done manually. Advance techniques like machine learning provide the way to find the genetic mutation growth automatically. To automate the process, some studies used classification algorithms like random forest, naive Bayes, XGBoost, and LSTM. The above-mentioned classification algorithms are plagued by issues of less accuracy due to spare data. The computation cost is high to train the model for prediction. To automate the process, the proposed method integrates the LSTM model with word embedding word2vec technique. The LSTM-based neural network model predicts the gene sequence and increases the classification accuracy by using traditional recurrent neural networks. The 92 “long short-term memory (LSTM)” model received recent popularity among neural network models. Word embedding techniques convert the text into machine understandable code like vectors. These vectors will easily adapt to machine learning models. The proposed method combines the genetic variations along with clinical text. Word embedding techniques are mainly used to understand the semantic meaning of clinical annotations, which will enhance performance. The experimental result reveals that the LSTM-based word embedding technique achieves 84% accuracy. The result analysis of the proposed combined approach outperforms the existing methods.