ABSTRACT

The detection and quantification of transcriptional activity on the genome by Cap Analysis of Gene Expression (CAGE) requires reliable and high-throughput tag mapping. This chapter discusses the basic considerations of sequence tag mapping and how these apply to CAGE. Mapping to a reference genome is a fundamental step in the analysis of high-throughput sequence tag data. This computationally intensive process typically requires a matching algorithm and several pre- and post-processing scripts integrated to form an automated pipeline. If matches are allowed to be too short or are allowed to contain too many errors the probability of mapping to multiple genomic locations, and therefore being difficult to interpret, becomes very high. Format the sequence tag set, such as clustering identical tags, and then align each tag to a reference genome using the chosen matching algorithm and minimum match requirements.