ABSTRACT

As a core technology of Genomics, DNA sequencing has a constant development since the massively parallel sequencing arose (Shendure et al. 2008, Koboldt et al. 2013). The reducing cost of sequencing greatly broadens the sequencing scope to address the biological questions at a genome-wide scale. For instance, 1000 Genomes Project (1000GP) has already sequenced 1092 individuals aiming to establish the most detailed catalogue of human genetic variation, and there are other projects that are ongoing to detect Human Disease genes’ complex traits by employing the sequencing into clinical diagnostics (Mardis et al. 2011, Rehm et al. 2013) or to research other species’ genetic information. These extensive sequencing would generate huge genome data volume, which intensifies the pressure for storage device. The falling sequencing expense cannot meet the cost to store, transfer and analyse these massive pouring genome datasets, which puzzles the biologists greatly. Data compression is an effective method to relax the conflict.