ABSTRACT

Acknowledgments.................................................................................................. 278 References.............................................................................................................. 278

The genetic information for life is stored in the nucleic acids, while proteins are the workhorses that are responsible for transforming this information into physical reality. Proteins are the macromolecules that perform most important tasks in organisms, such as the catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its function. The genome (DNA) sequences of over 180 organisms, including a draft sequence of the human genome [1,2], has now been completed. For over 105 of these, these data are publicly available and contribute about 413,000 protein sequences, that is, about one-fourth of all currently known protein sequences [3-5]. The number of entirely sequenced genomes is expected to continue growing exponentially for at least the next few years. With the availability of genome sequences of entire organisms, we are for the first time in a position to understand the expression, function, and regulation of the entire set of proteins encoded by an organism. This information will be invaluable for understanding how complex biological processes occur at a molecular level, how they differ in various cell types, and how they are altered in disease states [6]. Identifying protein function is a big step toward understanding diseases and identifying novel drug targets [7]. However, experimentally determining protein function continues to be a laborious task requiring enormous resources. For example, more than a decade after its discovery, we still do not know the precise and entire functional role of the prion protein [8]. The rate at which expert annotators add experimental information into more or less controlled vocabularies of databases snails along at an even slower pace. This has left a huge and rapidly widening gap between the amount of sequences deposited in databases and the experimental characterization of the corresponding proteins [9,10]. Bioinformatics plays a central role in bridging this sequence-function gap through the development of tools for faster and more effective prediction of protein function [11-13].