ABSTRACT

For one such as the author who was trained in classical plant breeding, the dramatic progress that has been made in revealing, understanding, and manipulating the hereditary blueprint of complex crop genomes continues to instill a sense of wonder. Only 20 years ago, the botanical research community was just gaining its first glimpse into plant genomes, often in the form of Southern blots segregating for restriction fragment length polymorphism (RFLP) loci. Today, at least ten of the genomes that we studied by what now seem rather crude techniques are fully sequenced, with many more slated for sequencing as described elsewhere in this volume (see Chapters 6-10 for details). Technological progress in many areas, such as dramatically accelerated sequencing technologies (Margulies et al. 2005; Shendure et al. 2005), targeted approaches for identification of mutants useful to determine functions of specific genes (Henikoff et al. 2004), and high-throughput methods to identify functional elements in genomic sequence (Birney et al. 2007), set the stage for choosing organisms for study based on their intrinsic ecological or evolutionary interest rather than because they are facile for genomics. This is not to downplay the importance of botanical models,

Andrew H. Paterson

Plant Genome Mapping Laboratory,

University of Georgia, Athens GA, USA

E-mail: paterson@plantbio.uga.edu

in which the groundwork has been laid for many pan-taxon goals such as determining the functions of many thousands of genes (Alonso et al. 2003) and deducing the macro-evolutionary history of angiosperms (Bowers et al. 2003; Paterson et al. 2004). However, major gaps remain, for example, in relating genetic mechanisms to evolutionary outcomes, and in understanding how this relationship is mediated by ecological factors. Genomic models, selected for small genomes and short life cycles, present a biased picture of genome structure and evolution, and have intrinsic limitations as whole-organism-level study systems. Such gaps in knowledge will increasingly need to be filled by study of plants that are not traditionally viewed as botanical models. The ~200 or so crops that sustain humanity and our livestock will play a singularly important role in rounding out understanding of botanical diversity, since they combine economic importance with one or more attributes that distinguish them as a botanical model for some specific aspect of growth and development, such as single-celled seed-borne epidermal fiber of cotton, the subterranean pod containing oil-rich seeds of peanut, or the remarkable biomass productivity of Miscanthus (Heaton et al. 2004).

There is good reason to expect that the genomes of most of our major crops will be fully sequenced early in the 21st century. Because plant genome sizes vary by nearly 2,000-fold, from 1C = 63 Mbp for the carnivorous plant Genlisea margaretae (Greilhuber et al. 2006) to 124,852 Mbp for the lily Fritillaria assyriaca (Bennett and Smith 1991), the decision to sequence one is presently a complex equation that integrates genome size with scientific/economic/social impact, phylogenetic distance from previously-sequenced plants (i.e. new information yield), relevant information from prior studies (such as genetic/physical maps or ESTs), sequencing/assembly/annotation costs, and the persuasiveness of individual (or groups of) investigators. In the aggregate, the genomes of 70 crops for which I found estimates of genome size total 1.48 x 1011

bp of DNA (Paterson 2006). Anticipating that these are representative of the remainder of the ~200 domesticates, to fully sequence only one genotype for each using present whole-genome shotgun or BAC-based technology (each assuming 8x redundancy of sequence coverage) would involve about 3.4 x 1012 bp of raw sequence, more than 72x the 4.9 x 1010

bp archived in GenBank as of this writing. In that new technologies (Shendure et al. 2004) together with ongoing efficiencies promise to

sustain the sequence growth rates of about 60% per year that have been realized since the 1980s, the complete sequencing of these 200 domesticates would be predicted to take a remarkably short 14 years.