ABSTRACT

I. INTRODUCTION In the postgenomic era, in an age in which whole genomes are completely sequenced within months, a wealth of DNA sequence information is available. Genome sequences have been decoded from simple bacteria, like Escherichia coli, Bacillus subtilis, and pathogenic strains, to model organisms, Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster, up the evolutionary tree to Homo sapiens. Bioinformatic analysis usually reveals an enormous number of putative open reading frames (ORFs), whose functions most often are unknown. Generally, sequence and structural homology to previously characterized genes is the key element to identify the function of novel ORFs. A severe limitation lies in the fact that no putative function can be assigned to genes that do not share homology with other known genes, as homology is the tool to assign and predict function of an identified gene. Between 31% (Helicobacter pylori) and 50% (Saccharomyces cerevisiae) of putative ORFs for these model organisms are of unidentified function as determined using homology-based approaches [1,2].