ABSTRACT

A critical component of the basic biology of an organism is its functional capabilities, that is, the range of biological functions the organism can implement. Increasingly, biologists are facing the challenge of inferring the functional capabilities of organisms using in silico approaches based on genome sequence. Moreover, comparative genome analysis requires the automated analysis of these capabilities and functions across arbitrary panels of related and unrelated organisms. Finally, similar analyses are required to assess the functional capabilities of specific cell types using gene expression microarray and proteomic analyses. To address these challenges, we have developed a set of methodologies that take as input the set of sequenced genes from a specific query organism and data from comprehensive publicly available databases. Using this input, comprehensive biological pathways are reconstructed for the query organism. For each pathway we calculate two scores, connectedness and completeness, to estimate in silico whether the pathway is actually operational in the query organism. The set of operational pathways then represents the functional capabilities of the organism. The completeness and connectedness scores also facilitate rapid comparison of capabilities across sets of different organisms. These technologies are being extended to enhance confidence levels in the conclusions and are also being applied to the analysis of data from gene expression and proteomic analyses.