ABSTRACT

In the midst of this large gap in the size of the data and the diversity of its sources of production, many problems have emerged, especially in biomedical data. Big data is characterized by its large size, whereas biomedical data are characterized by missing, ambiguous, and inconsistent data. It is necessary to remove mislabelled instances by learning algorithms for finding accurate data and increasing classification accuracy.

In this chapter, we propose a framework for removing misclassified instances to improve classification performance of biomedical big data. Our framework has four main stages, which are preparation, feature selection, instance reduction and classification stages. In the instance reduction stage, we try to reduce and remove instances that cause misclassification. We used a fuzzy-rough nearest neighbour classification to remove mislabelled instances. Experimental results proved the great effectiveness of the proposed technique on biomedical big data to enhance classification accuracy. A classification tree is the most influence classification techniques by applying our model. Our model helps to raise its accuracy to 89.24%.