ABSTRACT

When committees advising on the sequencing of complex genomes met in the late 80s, they estimated a wait of 12 to 15 years before the entire human genome sequence could be made available. While the size of the genome to be sequenced was estimated around 3 billion bp, rough estimates placed the number of genes to be about a few tens of thousands spanning only about 3% of the entire genome [1]. The fact that the gene space is relatively small spawned a new drive in seeking alternative sequencing methods to quickly discover the gene space without having to wait for the completion of the entire genome project. Invented in early 1990s [1], high throughput cDNA sequencing is one such technique in which double stranded DNA molecules called complementary DNA (or cDNA) molecules are synthesized from messenger RNA (or mRNA) libraries collected from the cells of living tissues. Later, these cDNA clones are read in a single-pass from either end, resulting in a subsequence of the original clone called an Expressed Sequence Tag or simply EST. As with any other sequencing technologies, this sequencing procedure is also vulnerable to errors. Nevertheless, the simplicity of the procedure has proved instrumental in its proliferation and has provided a cost-effective means to support high-throughput sequencing of transcribed portions of the genome.