ABSTRACT

DNA microarray technology has enabled biologists to study all the genes within an entire organism to obtain a global view of gene interaction and regulation. This chapter introduces some of the most common data mining methods that are being applied to the analysis of microarray data and discusses the likely future directions. Data mining has been defined as the process of discovering knowledge or patterns hidden in datasets. The chapter presents some of the most commonly used methods for gene expression data exploration, including hierarchical clustering, K-means, and self-organizing maps (SOM). It describes that support vector machines (SVM) have become popular for classifying expression data, and the basic concepts of SVM. Different clustering algorithms may produce different clusters from the same data set. The global search and optimization methods such as genetic algorithms or simulated annealing can find the optimal solution to the square error criterion, and have already demonstrated certain advantages.