ABSTRACT

Cancer is increasingly becoming the leading cause of death around the world. Machine learning is critical in the deployment of an automated model that aids in better diagnosis. The traditional cancer diagnosis relies solely on biopsy data, which overlooks essential elements of disease such as the rate of proliferation and tissue behavior. As a result, genetic data may play an essential role in identifying cancer subtypes. Microarray data contains a patient’s genetic information in a large number of dimensions, such as genes, with limited sample size, such as patient details. If the microarray is directly taken without reducing the dimension as the input to any ML model for classification, then small sample size is the resulting issue. So, the microarray data has 150to be normalized by using either the dimensionality reduction technique or the feature selection technique. The main objective of this research work is to analyze the impact of the microarray dataset in cancer classification. The next focus is to study the various kinds of machine learning algorithms that can be used for cancer microarray data along with various validation methods available in calculating the accuracy of the algorithm. In the current research, the main focus is on building an integrated approach based on feature selection algorithm, optimization algorithm, and machine learning classification algorithm for efficient cancer classification. For this research work, we will utilize RFE for dimensionality reduction, cuckoo search (CS) for optimization, and SVM for classification for multiple benchmark datasets. The performance of the proposed model will be evaluated based on some tuned parameters such as accuracy, sensitivity, specificity, and F1-score. Finally, the result will be compared with state-of-the-art machine learning algorithms.