ABSTRACT

All viruses carry genetic information, which is encoded in genomic sequences. The main principles of automatic annotation of virus genomes are the detection of coding sequences by their oligonucleotide frequencies as well as the homology-based transfer of features and functional classifications from annotated genomes to newly sequenced genomes. In principle, comparative genomics of viral protein sequences is based on the same toolbox of sequence bioinformatics as it is used for cellular genomes. Virus-specific orthologous groups are provided by the eggNOG database, which includes thousands of well-annotated virus genomes. The recently developed VOGDB contains all virus genomes from NCBI RefSeq and group their protein sequences into protein families in a multistep approach that is based on orthology and remote homology. Early studies on comparative genomics of viruses suggested the grouping of protein families into several categories, according to their genomic occurrences.