ABSTRACT

Owing to the rapid technological advances in recent years, the amount of large-scale omics data obtained from comprehensive genomics, transcriptomics, metabolomics and proteomics studies has been rapidly accumulating and is being stored in Web databases. The nucleotide sequence data (entries) in the International Nucleotide Sequence Databases (INSD) (Brunak et al., 2002) are maintained by members of the International Nucleotide Sequence Database Collaboration DDBJ (Kaminuma et al., 2011), EMBL (Cochrane et al., 2009) and GenBank (Benson et al., 2011), and the number of entries is steadily increasing. While the ­rst version of DDBJ was released in July 1987, approximately 40% of the bases and entries in the latest version were released from 2007 onward. About 110 billion bases in more than 112 million DNA sequences are recorded in DDBJ (Release 80.0 as of December 2009). Such a rapid increase in genome and transcriptome sequence data is facilitated by innovations in the development of multicapillary sequencing and the next-generation sequencing (NGS) technologies. Further improvements in experimental methodology such as NGS will accelerate the increase in the volume of omics data.