ABSTRACT

Decoding the genome of species is becoming one of the most intriguing and sophisticated tasks for biologists and bioinformaticians in the recent decades. As the large amount of information are carried by DNA sequences embedded in the genome, it is essential to know the genome sequence in order to understand their biological implications. The early genome sequencing projects mainly targeted the viruses and bacteria (Sanger et al. 1977; Fleischmann et al. 1995) with small and compact genomes, as the experimental and assembly procedures are very expensive and time-consuming. With the development and optimization of the Sanger sequencing method (Hunkapiller et al. 1991) at the beginning of 1990s, more and more eukaryotic organisms with larger and more complex genomes have been successfully sequenced. The fi rst batch of sequenced organisms, S. Cerevisiae, C. elegans, D. melanogaster and A. thaliana, are regarded as model organisms. As these organisms are usually amenable to experimental manipulations such as cultivation, transformation and inbreeding, they are desirable for researchers to explore the abundant information hidden in their genomes.