ABSTRACT

The rapid advances of gene expression microarray technology have provided scientists, for the first time, the opportunity of observing complex relationships between various genes in a genome by simultaneously measuring the expression levels of the tens of thousands of genes in massive experiments. Analysis of large-scale genomic data in order to extract biologically meaningful insights presents unprecedented opportunities and challenges for data mining in areas such as gene clustering [3], sample class discovery, and classification [4]. In this chapter, we first introduce the challenges of microarray data analysis and some traditional solutions of feature selection, and then present a redundancy-based feature selection solution and demonstrate its effectiveness and efficiency on some benchmark microarray datasets.