ABSTRACT

Some of the new technologies are known to produce some characteristic systematic errors, such as diªculty estimating the length of homopolymer regions (i.e., single base repeats) and increasing base call errors toward the 3′ ends of reads. All of the aforementioned sequencing technologies include an estimate of the con¥dence in each base call as a quality value. ™ese characteristics of NGS data, which can be summarized as escalated ambiguity during assembly and exponential increase in the sheer amount of information to be processed downstream, pose computational challenges for NGS data analysis pipelines. More computationally intensive algorithms to cope with the ambiguity are needed, together with more computing power and ample storage space to accommodate sequence and other types of genomic data. ™e computational challenges of NGS data analysis have been recently reviewed (Li et al., 2011; Pop, 2009).