ABSTRACT

Although biologists are aware that there are likely mistakes in a draft genome assembly, there is a lack of computer software to automatically detect the mistakes or to assign confidence scores in difference regions of the

T&F Cat # C6847 Chapter: 8 page: 163 date: August 5, 2009

T&F Cat # C6847 Chapter: 8 page: 164 date: August 5, 2009

assembly. The problem, referred to as the genome validation problem herein, is particularly important owning to the wide application of the massively parallel sequencing technologies, which often generate short reads and thus increase the chance of misassembly.