ABSTRACT

Biological sequence comparison studies the relationship between DNA or protein sequences. A pairwise alignment algorithm matches fragments in a sequence of unknown function, termed the query, to similar fragments in a reference sequence from a large database. Biologically relevant matches may represent genes, structural domains, regulatory elements, or other sequence features that provide clues to the biochemical function and structure of the query. Biosequence databases, such as GenBank from the U.S. National Center for Biological Information (NCBI), provide an annotated list of reference sequences that form the basis for comparative analysis. High-throughput comparison is widely used to annotate functional elements in newly sequenced genomes, to assemble sequence reads against a reference genome, to compare related genomes, or to analyze sequence reads from microbial communities.