Feature Selection for Genomic Data Analysis

doi:10.1201/9781584888796-28

ABSTRACT

The rapid advances of gene expression microarray technology have provided scientists, for the ﬁrst time, the opportunity of observing complex relationships between various genes in a genome by simultaneously measuring the expression levels of the tens of thousands of genes in massive experiments. Analysis of large-scale genomic data in order to extract biologically meaningful insights presents unprecedented opportunities and challenges for data mining in areas such as gene clustering [3], sample class discovery, and classiﬁcation [4]. In this chapter, we ﬁrst introduce the challenges of microarray data analysis and some traditional solutions of feature selection, and then present a redundancy-based feature selection solution and demonstrate its eﬀectiveness and eﬃciency on some benchmark microarray datasets.