ABSTRACT

This chapter explores the Genome Analysis Toolkit (GATK). GATK is probably the most widely used variant caller, but it is a large software package that also has major functionalities for further improving the alignment and the quality scores of the BAM file prior to final variant calling. The postprocessing of the alignment is often also referred to as "post-alignment" processing. The Picard Tools suite is designed to work with the GATK variant calling software. The Picard suite contains numerous tools for manipulating NGS data and formats such as SAM/BAM and VCF, and focuses on post-alignment processing tasks required to get raw alignments ready for variant calling with GATK. GATK has excellent and comprehensive documentation covering a very wide range of use cases. The alignment of reads by BWA-MEM is done read-by-read, and may tend to accumulate erroneous single nucleotide variant (SNV) calls near true insertions or deletions (indels) due to misalignment, mainly because alignment algorithms penalize mismatches less than gaps.