ABSTRACT

Microarrays allow simultaneous measurements of gene expression of almost every gene in the human genome. Understanding the relationship between a particular genetic location and its expression is fundamental to elucidating the relationships among genes, transcripts of other genes, and proteins translated from those transcripts. For example, in the nuclear hormone receptor superfamily, the Peroxisome Proliferator-Activated Receptor Gamma (PPARγ) protein is one whose variants have different effects on the production of a large group of transcripts (Bush et al., 2007). Currently, there are few statistical approaches that use all available biological information beyond the expression data in analyzing mRNA transcript experiments. However, it is expected that incorporating additional data or knowledge can help to provide better understanding of gene expression signatures. To help improve current approaches and provide further understanding in using additional biological information in expression studies, we discuss the use of Gene Ontology (GO) data in expression trait mapping via genotype data. A key feature of our investigation is analyzing genes that are correlated as a function of GO distance. In this sense we examine the incorporation of “biological distance” into expression trait loci mapping. Several authors have proposed incorporating GO data in expression studies, for example, clustering expression profiles (Pan, 2006). However, most do not consider the possible adverse effects of using GO data, especially GO information based on weak evidence. In some cases, the incorporation of GO information is done rather informally. Here we return to more basic principles and issues, focusing instead on a proof of concept investigation addressing how to include the GO information and circumstances under which it may be helpful to do so. Simulations are used to compare ETL mapping with and without the use of GO data.