ABSTRACT

Coupling bisulte modication with next-generation sequencing (BS-Seq) provides epigenetic information about cytosine methylation at single-base resolution across the genome and requires the development of bioinformatics pipeline to handle such a massive data analysis. Because of the cytosine conversions, we need to develop bioinformatics tools specically suited for the volume of BS-Seq data generated. First of all, given the methylation sequencing data, it is necessary to map the derived sequences back to the reference genome and then determine their methylation status on each cytosine residue. To date, several BS-Seq alignment tools have been developed. BS-Seq alignment algorithms are used to estimate percentage methylation at specic CpG sites (methylation calls), but also provide the ability to call single nucleotide and small indel variants as well as copy number and structural variants. In this chapter, we will focus on the challenge presented by methylated sequencing alignment and methylation status. ere are basically two strategies used to perform methylation sequencing alignment: (1) wild-card matching approaches, such as BSMAP, and (2) three-letter aligning algorithms, such as Bismark. ree-letter alignment is one of the most popular approaches described in the literature. It involves converting all cytosine to thymine residues on a forward stand, and guanine to adenine residues on its reverse stand. Such a conversion is applied to both reference genome and short reads, and then followed by mapping the converted reads to the converted genome using a short-read aligner such as Bowtie. Either gapped or ungapped alignment can be used, depending on the underlying short-read alignment tool.