ABSTRACT

Breast cancer is one of the most lethal forms of tumors that mostly targets the female population. The classification of malignant (cancerous) and benign (noncancerous) tumors is of great importance in oncology.

A number of methodologies and algorithms for data mining techniques have been used by researchers to aid in the diagnosis of malignant tumors. Several algorithms have been proposed recently for this task.

The aim of this chapter is to design a prediction system to identify whether the tumor in a patient’s breast is malignant or benign by implementing ensemble classification using a voting mechanism.

The prediction is made by analyzing the patient’s historic information or data repository. The data used in this project are from the Wisconsin Breast Cancer Database available in the UC Irvine Machine Learning Repository.

The classifier algorithms tested were logistic regression, k-nearest neighbor, decision tree, and support vector mechanism.

The three best classifiers were selected based on performance metrics (accuracy, precision, and recall F1 score) and were nominated for ensemble classification using a voting mechanism.

The voting options used in this project were hard/majority and soft voting. Of the mentioned algorithms, soft voting with a weighted average outperformed the other algorithms with an accuracy of 98.8%, precision of 100%, recall of 97.0%, and F1 score of 98.5%.