ABSTRACT

In the last two decades, advances in high-throughput technology have revolutionized biology and turned it into an information-based science. The fi rst spike in data generation occurred in the early 1990s when large scale sequencing became available through the use of Sanger technology for expressed sequence tag (EST) and bacterial artifi cial chromosome (BAC) sequencing (Adams et al. 1991; Shizuya et al. 1992). As with many other non-

1Department of Horticulture and Landscape, Washington State University, Pullman, WA 99164, USA. *Corresponding authors: sook_jung@wsu.edu; dorrie@wsu.edu

commodity crops, researchers would have to wait until nearer the end of the decade before funds were available to generate these resources for Prunus species. In 2001, the fi rst Prunus ESTs became available in the National Center for Biotechnology (NCBI) dbEST repository (https://www.ncbi.nlm. nih.gov/nucest/), rising to over 100,000 Sanger sequenced ESTs by 2011. The same high quality sequencing technology would be used to generate the peach genome sequence, released publicly on April 1, 2010 (https:// www.rosaceae.org/peach/genome). In the last few years, the advent of next generation technologies, such as 454 and Illumina, have signifi cantly enhanced the ability to generate large-scale transcriptome and genome sequence at a fraction of the cost of Sanger sequencing. Similar advances in DNA Array, nuclear magnetic resonance (NMR), fourier transform infrared spectroscopy, fourier transform ion cyclotron resonance mass spectroscopy, high performace liquid chromotography, and mass spectrometry have generated large-scale gene expression, proteome and metabolome data for many species. Other data types include molecular marker data along with genetic mapping data and/or large-scale genotyping data of various varieties. More recently, high-throughput phenotypic and genotypic data are also being generated to study the relationship between genotype, phenotype and environment as well as for breeding purposes. All of these large-scale data require proper analysis, storage and integration to enhance our understanding of biology and to be utilized in further research. Bioinformatics tools and methodologies, therefore, have become an essential and integral part of biological research.