ABSTRACT

RNA viruses, such as HIV, HCV, Influenza virus, and SARS-CoV-2, are of major concern to public health. One of the most basic problems is that low-frequency SNVs are difficult to distinguish from technical errors, for example, due to polymerase chain reaction amplification and sequencing protocol-related issues. At the local scale, consecutive genomic regions are considered that can be covered by the average sequencing read length such that local haplotypes can be estimated. In the latter case, one may try to generate a consensus sequence of all input reads and use it as a reference in later steps, or completely forego this requirement. The structure of HMMs makes it possible to explicitly account for recombination events in the model. Various software engineering aspects have an impact on how easy a tool is to install and use, and thus how helpful it can be in practice for global haplotype reconstruction efforts of other researchers.