ABSTRACT

In this chapter, the authors explain their experience in handling data after completion of Cap Analysis of Gene Expression (CAGE) sequencing reads and extraction of the tags. Linkers can contain deoxyribonucleic acids barcodes that can be used to determine the original ribonucleic acids (RNA) source in case the samples are mixed. The authors describe strategies for extraction of CAGE tags from possible contaminating sequences. They examine how to assess the quality of the libraries in terms of sequencing quality and to look for potential contamination of sequences that are not derive from RNAs. There are a number of indicators of read quality that can be used for pre-extraction filtering in order to increase the mapping rate and specificity and decrease computer running time. In 454 CAGE libraries, Polymerase Chain Reaction errors are naturally eliminated from the linker sequences since concatenation of tags requires a completely matching restriction enzyme site.