ABSTRACT

Given the ease with which data can be collected and statistical tests can be performed with computational statistics packages, the problem in contemporary molecular biology is that we oen have too many statistical tests that we might want to do. For example, aer a high-throughput screen we are typically in the situation where we have obtained a list of genes from the experiment, and we have no idea what those genes have to do with our experiment. We can try to come up with a hypothesis or two, but it’s oen more convenient to do systematic gene set enrichment analysis. We test for all possible enrichments of anything we can get our hands on: gene ontology (GO) categories, Munich Information Center for Protein Sequences (MIPS) functions, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, ChIP-seq compendia, phosphoproteomics databases, … you get the picture. In bioinformatics jargon, these types of functional assignments to genes are known as “annotations.” Table 3.1 shows some examples of functional annotations.