ABSTRACT

In this paper we performed different ensembling learning techniques to handle imbalance of classes on a given dataset and do further analysis of the result obtained. At first various techniques were identified to handle class imbalance such as data pre-processing, cost sensitive and ensemble learning methods. After separating testing and training dataset, we implemented the data pre-processing techniques such as oversampling and undersampling and then implemented neural networks which train on imbalance dataset by respecting the introduced class weights. We introduced the SMOTE algorithm and further refined it by adding Bagging and Boosting methods. Any errors that emerged in the streamlining of the process was taken into consideration as well. The results obtained by training the models on Training data were collected and further verified it by testing it on Testing data. Then results were compared based 175on their ROC value and various parameters obtained from the confusion matrix. During the comparative analysis, we found that data pre-processing by oversampling outperformed undersampling+random forest method based on ROC values. Among all the methods used, Neural Networks combined with the cost sensitive learning’s performance in terms on ROC value was the best. In general, ensembling learning mechanism based on boosting yielded better results than bagging and SMOTE+Bagging along with Neural Networks technique resulted in the best precision value.