ABSTRACT

The majority of DNA sequence and expressed gene sequence data generated today comes from the next-or second-generation sequencing (NGS/2GS) technologies. NGS technologies produce vast quantities of short data rather than Sanger sequencing at a relatively low cost and short time. Genomics is undergoing a revolution, driven by advances in DNA sequencing technology, and this data flood is having a major impact on approaches and strategies for crop improvement. NGS technologies have been applied for sequenced genomes of a number of cereal crop species including rice, Sorghum and maize. A quality sequence of rice that covers 95% of the 389 Mb genome has been produced [1]. The Sorghum bicolor (L.) Moench genome has been assembled in size of 730-megabase, placing ~98% of genes in their chromosomal context [2]. The draft nucleotide sequence of the 2.3-gigabase genome of maize has also been improved [3]. One of the challenges encountered by researchers is to translate this abundance of data into improved crops in the fi eld. There remains a gap between genome data production and next-generation crop improvement

strategies, but this is being rapidly closed by far sighted companies and individuals with the ability to combine the ability to mine the genomic data with practical crop-improvement skills. Bioinformatics can be defi ned as the structuring of biological information to enable logical interrogation, and databases are a key part of the bioinformatics toolbox. Numerous databases have been developed for genomic data, on a range of platforms and to suite a variety of different purposes (see Table 1 for examples). These range from generic DNA sequence or molecular marker databases, to those hosting a variety of data for specifi c species.