ABSTRACT

Recent advances in sequencing technologies have drastically reduced the cost of nucleotide sequencing [1,2] and are rapidly establishing themselves as very powerful tools for quantifying a growing list of cellular properties that include sequence variation, RNA expression levels, protein-DNA/RNA interaction sites, and chromatin methylation [3-8]. An expensive step in the sequencing process is sample preparation where time consuming procedures such as library preparation must be applied to each individual sample. This greatly reduces the utility of a sequencer for sequencing a small genomic region in many individuals because the cost of preparing each sample counteracts the efficiency of the sequencer. In fact the sequencing capacity in terms of the number of reads generated by the sequencer is often much higher than is necessary for the application. This raises the need for the development of multiplexing strategies that allow the processing of multiple samples per single sample preparation step at

the cost of requiring additional sequencing capacity. However, in several practical scenarios, the overall cost can be reduced. One such multiplexing scheme is the use of overlapping pools [9-11]. In this scheme subsets of samples are mixed together into pools followed by a single sample preparation for each pool. Typically in such a sample preparation, a barcoding technique is applied so each read generated from the pool will be able to be identified as originating from the pool. By combining the results of the sequencing with the information on which samples appeared in which pool, the mixed information from each pool can be “decoded” to obtain information on the sequence of each sample.