ABSTRACT

Statistical methods have been developed to extract biological information from such large sets of genome-wide data ( 5 ). For analyzing the similarity of gene expression patterns, commonly used techniques include clustering ( 6 ), self-organizing maps ( 7 ), principal component analysis ( 8 ), and support vector machine ( 9 ) techniques. Many statistical tools aim to identify genes whose expressions are signifi cantly altered under different conditions. This is generally based on the statistical analysis of the individual gene expression pattern (such as ANOVA and t -test). The application of thresholds for fold-change and/or p -value

1. INTRODUCTION 269 2. ENRICHMENT ANALYSIS OF HIGH-CONTENT DATA 270 3. ANALYZING CONDITION-SPECIFIC NETWORKS 274

3.1. Subnetworks Generated from a List of Differential Genes 276 3.2. Identifying Differential Response Subnetworks from Gene Expression 278

4. NETWORK MEASURES 281 4.1. Degree of Nodes 281 4.2. Average Clustering Coeffi cient 282 4.3. Average Shortest Paths 282 4.4. Centrality of Nodes 282 4.5. Statistical Test for Topological Quantities 282 4.6. Over-and Underconnected Nodes in Subnetworks 283

REFERENCES 285

generates a smaller set of genes whose expressions vary between different conditions, and, therefore, is of interest for further analysis. One common approach is to apply gene enrichment analysis across knowledge-based functional categories. We show an example of such an enrichment analysis, investigating a set of differentially expressed genes under high and low cholesterol diets to understand the biological processes involved in atherosclerosis.