Clustering and Classiﬁcation Methods for Gene Expression Data Analysis
Efﬁcient use of the large data sets generated by gene expression microarray experiments requires computerized data analysis approaches (1, 2). In this chapter we brieﬂy describe and illustrate two broad families of commonly used data analysis methods: class discovery and class prediction methods. Class discovery, also referred to as clustering or unsupervised learning, has the goal of partitioning a set of objects (either the genes or the samples) into groups that are relatively similar, in the sense that objects in the same group are more alike than objects in different groups (3, 4). A typical application is to generate hypotheses about novel disease subtypes (5, 6). Class prediction, also referred to as classiﬁcation or supervised learning, has the goal of determining whether an object (usually a sample, but sometimes a gene) belongs to a certain class (7, 8). A typical application is classiﬁcation of patients into existing disease subtypes or prognostic classes (9, 10) using gene expression information.