ABSTRACT

In the field of genomics, High dimensional data is primarily utilized to detect the essential genes that play a vital role in determining the disease diagnosis using expression levels. The number of features in the High dimensional dataset is extremely very high when compared to the samples present in the dataset. The features in the dataset are usually given as input to a learning algorithm for classification of diseases. However, in the High dimensional data most features are redundant and irrelevant or noisy which will decrease the learning accuracy. To solve these problems, Feature selection technique is employed a significant role. Feature selection is one of the important preprocessing step for prediction and classification of disease. It aims to find informative features, selecting a small subset of relevant features from the original set of features by removing the redundant and irrelevant features from the dataset which can reduce the computational time and improving the classification accuracy. Due to increase in dimensionality of High dimensional data imposes a significant challenge to many existing feature selection methods in terms of prediction and accuracy of the model. This research work analyses about the use of various Feature selection methods that can select prominent attributes from the High dimensional dataset for classification of diseases.