ABSTRACT

Sequence Alignment/Map (SAM) format is a tab-delimited text format that aims to be a universal format for storing alignments of NGS reads to a reference genome. BAM (Binary Alignment/Map) is the compressed version of SAM. A CIGAR2 string is comprised of a series of operation lengths plus the operations that describe how exactly a read has been aligned to the reference sequence. The MAPQ score (MAPping Quality) reflects the probability that the read is aligned to the wrong position in the genome. The current SAM format uses 12 bitflags, each of which can have a value of 1 (Yes, True) or 0 (No, False). An arbitrary number of optional fields may follow the 11 mandatory fields. The NM tag, which is predefined, takes an integer value (i). The MD field provides additional, reference-centered, information about the alignment. The RG field indicates the read group of the read, e.g., RG:Z:rg1. The AS field indicates the alignment score generated by the aligner.