ABSTRACT

Advances in genomics are driven by genome sequencing projects. The goal of a genome sequencing project for an organism is to determine the genome sequence of the organism. Only short sequences of up to 1000 base pairs (bp) can be directly produced by sequencing machines. However, genomes are huge; bacterial genomes are a few million base pairs (Mb) in size, animal genomes can be a few billion base pairs (Gb) in size, and plant genomes can be tens of Gb in size. Thus long genome sequences have to be constructed from short sequences, which is called fragment assembly or genome assembly.