ABSTRACT

Efficient use of the large data sets generated by gene expression microarray experiments requires computerized data analysis approaches (1, 2). In this chapter we briefly describe and illustrate two broad families of commonly used data analysis methods: class discovery and class prediction methods. Class discovery, also referred to as clustering or unsupervised learning, has the goal of partitioning a set of objects (either the genes or the samples) into groups that are relatively similar, in the sense that objects in the same group are more alike than objects in different groups (3, 4). A typical application is to generate hypotheses about novel disease subtypes (5, 6). Class prediction, also referred to as classification or supervised learning, has the goal of determining whether an object (usually a sample, but sometimes a gene) belongs to a certain class (7, 8). A typical application is classification of patients into existing disease subtypes or prognostic classes (9, 10) using gene expression information.