ABSTRACT

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 11.2 Applications of Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 11.3 Analysis of Gene Expression Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

11.3.1 Data Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 11.3.2 Detecting Differentially Expressed Genes Across Different Samples. . . . . . . . . . 246 11.3.3 Identifying Clusters of Co-Expressed Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

11.3.3.1 Overview of Clustering Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 11.3.3.2 Assessing Statistical Significance of Observed Patterns . . . . . . . . . . . . . 250

11.3.4 Gene Expression Based Tumor Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.4 Analysis of CGH Microarray Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 11.5 Integrating Current Knowledge and Various Types of Experimental Data . . . . . . . . . . . . . . . 255

11.5.1 From Co-Expression to Co-Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 11.5.2 Integrating Microarray CGH and Expression Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.5.3 Modeling Genetic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

The advent of DNA microarray technology has added a new dimension to the field of molecular carcinogenesis research. DNA microarrays have been used as a tool for identifying changes in gene expression and genomic alterations that are attributable to various stages of tumor development. Patterns defined by expression levels of multiple genes across different types of cancerous and normal tissue samples have been used to examine relationships between different genes, and as the tool for molecular classification of different types of tumor. The analysis of relatively large datasets generated in a typical microarray experiment generally requires at least some level of computer-aided automation. On the other hand, the large number of hypotheses that are implicitly tested during the data analysis, especially when identifying patterns of expression through supervised and unsupervised learning approaches, require careful assessment of statistical significance of obtained results. These basic requirements have brought to the forefront the need for developing statistical models and corresponding computational tools that are specifically tailored for the analysis of microarray

“1167_C011” — — #2

data. Such models need to be able to differentiate between faint, yet statistically significant and biologically important signals, and patterns that are generated by random fluctuations in the data. In this endeavor, it is important to keep in mind the abundance of already existing statistical and machine-learning methodologies which can serve as the starting point for developing more specialized techniques. Here we describe different uses of DNA microarray technology in molecular carcinogenesis research and related methodological approaches for analyzing and interpreting DNA microarray data obtained in such experiments.