ABSTRACT

Next-Generation Sequencing (NGS) technologies are widely used now. To facilitate NGS data analysis and NGS data transfer, a few NGS file formats are defined. This chapter gives an overview of these commonly used file formats. It briefly describes the NGS data analysis process and illustrates the relationships among the NGS file formats. The Sequence Alignment/Map (SAM) format is a generic format for storing alignments of NGS reads. The Binary Alignment/Map (BAM) format is the binary equivalent of SAM. SAM uses the 1-based coordinate system, and the BAM uses the 0-based coordinate system. Variant Call Format (VCF) is the format specially designed to store the genomic variations, which are generated by variant callers. Information stored in a VCF file includes genomic coordinates, reference allele and alternate allele, etc. The chapter also includes exercises related to the NGS file formats.