ABSTRACT

This chapter explains the foundations of reference-based alignment and shows how to use BWA-MEM for this purpose. Clearly, a high quality human genome assembly is essential for medical genomics and translational research. There are many ways of getting the human reference genome files. Alternatively, one can use the genome build of the Genome Reference Consortium (GRC). This build may be more up to date, but the disadvantage is that accession numbers rather than the "plain" chromosome names are used, and users may need to convert those names prior to downstream analysis. The GRCh38 assembly has not only corrected single nucleotide errors but also updated the overall structure of the genome assembly by changes such as gap filling. To address this issue, coordinates can be converted from one assembly to the other using UCSC's liftOver tool. Many current alignment algorithms rely on indices related to the Burrows– Wheeler Transformation (BWT) in order to localize reads to the reference genome.