ABSTRACT

Genomics is defined as the determination of the entire genomic sequence of an organism or species (Figure 22.1). It usually includes annotation (identification and labeling) of genes, or as many as can be determined. For virus genomes, this usually was done manually. However, computational methods had to be developed for larger genomes, including those of species of bacteria, archaea, and eukarya (Figure 22.2). Eukaryotic genomes required the development of additional strategies, due to their larger sizes (from tens of millions to hundreds of billions of base pairs per genome), their organizations on multiple chromosomes, their large numbers of introns, the large amounts of non-protein-encoding regions, and the organellar genomes that accompanied the nuclear genomes. Some of the smallest genomes were sequenced first: Saccharomyces cerevisiae (brewer yeast, 12.5 Mb), Caenorhabditis elegans (nematode, 100 Mb), Drosophila melanogaster (arthropod, 132 Mb), and Arabidopsis thaliana (plant, 157 Mb). Conveniently, eukaryotic genomes are subdivided into chromosomes, and although the sizes of chromosomes vary widely, each is roughly the size of some bacterial genomes. Much of the sequencing of eukaryotic genomes was initially accomplished using bacterial artificial chromosomes (BACs). Essentially, large sections of the eukaryotic genomes were cloned as BACs, which were used to transform bacteria so that large amounts of each would be produced by the transformed bacteria. Then, smaller clones were made from them and each was sequenced. Eventually, the entire sequence for each chromosome was determined, and then the entire genome was assembled. Each of these projects required enormous efforts from many laboratories.