Extreme-Scale De Novo Genome Assembly

doi:10.1201/b21930-18

ABSTRACT

This chapter addresses the aforementioned challenges by developing parallel algorithms for de novo assembly with the ambition to scale to massive concurrencies. Genomes mutate between every generation and even within individuals as they grow, and some of those mutations can drive cells to proliferate and migrate inappropriately, leading to diseases such as cancer. With the advent of exascale computing architectures expected within the next few years, many challenges arise into porting efficiently the HipMer de novo assembly pipeline to larger and more complex systems. The architectural trends dictate that the degree of parallelism within the system's node will be increased considerably compared to contemporary supercomputing systems. Obtaining the scalable pipeline required several new parallel algorithms and distributed data structures which take advantage of a global address space model of computation on distributed-memory hardware, remote atomic memory operations, and novel synchronization protocols.