ABSTRACT

Abstract The advent of Next Generation Sequencing (NGS) is transforming the landscape of biomedical research, ranging from disease gene discovery to clinical application of genomic medicine. NGS enables low-cost, high-throughput sequencing for a wide variety of genome-wide scale analysis of the genome, epigenome and transcriptome. However, with this vast quantity of data, we are faced with unprecedented technical challenges in terms of quality assurance of the computational analytical pipelines. In this chapter, we review current approaches used for bioinformatics validation and quality control in whole genome sequencing analysis for genomic medicine applications. We further discuss how state-of-the-art software testing techniques can be used to establish strong quality assurance measures in genome-scale bioinformatics.

1. Introduction Bioinformatics is the application of computational, mathematical and statistical techniques to solve problems in biology and medicine. Arguably the main research focus has so far been on the computational and statistical basis of the algorithms. Surprisingly much less effort has been placed on the validation and quality assurance of the tools that implement these algorithms – even though correct design and implementation of the underlying algorithm is at least as important as the algorithm itself. Incorrectly computed results may lead to wrong biological conclusions, and subsequently misguide downstream experiments. The widespread problem of errors or mis-use of scientific computing in biology and medicine is highlighted by recent news and commentary articles in top-tier journals such as Nature and Science on this