ABSTRACT

Molecular Systematics: Access to Nucleotide Sequence Information The first sequence database, the Los Alamos Sequence Database, was established in 1979 at the Los Alamos National Laboratories and culminated in 1982 with the creation of the public GenBank initiative. From 1989 to 1992, the GenBank initiative transitioned to the newly created National Center for Biotechnology Information (NCBI) and in the mid-1990s it became part of the International Nucleotide Sequence Database Collaboration (INSD). Th e INSD comprises the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank. Together, they hold an annotated collection of all publicly available DNA sequence data. Each database has its own set of submission and retrieval tools, but the three databases exchange data daily so that they all contain the same set of sequences. Since 1982 the number of base pairs (bp) and sequences submitted has grown exponentially. For example, in 1982 GenBank contained 680,338 bp and 606 sequences, in 2008 these numbers were roughly 9.9 × 1010 and 9.8 × 107 (information retrieved from NCBI and Wikipedia). Th e rapid accumulation of sequence data in GenBank (Fig. 1.2.1) allows combining data for the same sets of genes from diff erent studies.