ABSTRACT

The sequence of the human genome is over 3 billion basepairs (bp) long. Current sequencing technology can only identify sequence of 500 – 700 bp templates generated from pieces of the genome. The process of deciphering the sequence of a genome from the sequence of its orders of magnitude smaller pieces and any other additional information we may have about the genome is called assembling the genome. For a genome with genetic variation between individuals (e.g. the human genome where the genome of even monozygotic twins differs after somatic and/or epigenetic DNA changes) and DNA for sequencing taken from usually a very small number of individuals, an assembled sequence for a genome can only represent some of the variation present in the population. However, as most of the genome is common between two individuals in a population (e.g. ≈ 99% between two humans), the assembled genome does serve as a reference for the population.