ABSTRACT

Cluster analysis has provided a set of methods that has been very useful for exploring gene expression patterns from microarray data. The goal of such analysis is to construct classes of genes or classes of samples such that observations within a class are more similar to each other than they are to observations in different classes according to their expression levels. There are several reasons for interest in cluster analysis of microarray data. First, there is evidence that many functionally related genes have similar expression patterns [1,2]. By grouping genes in a coordinated manner according to their expression under multiple conditions, we may be able to reveal the function of those genes which were previously unknown. Second, a class of genes with similar expression pattern may reveal much about regulatory mechanisms. The common regulatory elements (e.g., motifs) identified in a class of genes would greatly facilitate our understanding of genetic networks [3,4]. Third, it provides a more reliable and precise way to distinguish different subtypes of tumors (e.g., breast cancers), which are not achievable by standard microscopic or molecular approaches, by classifying the samples on the basis of their gene expression levels [5-8]. The new subtype of tumors can also be identified. Eventually, such classifications can lead the advancement for successful prognosis, diagnosis, and therapeutics of diseases. Fourth, given that a microarray can potentially contain expression of tens of thousands of genes over several to hundreds of samples, by grouping either genes or samples, or both

DAAL: “dk2187_c009” — 2005/10/6 — 16:47 — page 160 — #2

simultaneously [1,9], clustering analysis potentially provides an effective way to reduce the complexity of data for easy organization, visualization, and interpretation.