ABSTRACT

Protein methylation, a form of post translational modifications (PTM), plays a significant role in numerous cellular methods. Prediction of protein methylation sites is significant to understand molecular mechanisms of protein methylation. However experimental ways for prediction of protein methylation sites are costly and time consuming. Therefore computational approaches for identification of methylation sites in protein are required for its fast speed and its convenience. In latest years, abundant computational techniques have been designed for prediction of methylation sites from protein primary sequences. The prediction performance of already existing techniques for methylation sites is not adequate in case of accuracy. So, we propose a decision tree based method to predict protein methylation sites. Our main focus is on arginine methylation (R) and lysine methylation (K). Our proposed method integrates various sequence based features to improve feature representation by using feature selection technique. Our proposed method gives accuracy of 82.4%, sensitivity of 78%, specificity of 86.8%, and Mathew correlation coefficient of 65.1% for arginine methylation as well as an accuracy of 72.5%, specificity of 73.1%, sensitivity of 72%, and Mathew correlation coefficient of 46.6% for lysine methylation sites. The results on standard datasets indicate that our proposed method overtakes various state-of-the-art techniques for prediction of methylation sites.