ABSTRACT

This chapter describes some issues worth reflecting on when analyzing Cap Analysis of Gene Expression (CAGE) data. It discusses a few particular challenges inherent in the data, which need to be addressed in future studies: the sampling depth problem, the difficulties in assessing noise and how to cluster CAGE tags in a meaningful way. The clusters lay the foundation for much of the downstream analysis protocols and are consequently imperative to most CAGE experiments. Decisions made during clustering will as a consequence influence the results of other analyses and it is therefore necessary to consider these choices carefully. In CAGE context the basal cluster is usually referred to as a tag cluster and is an extension of the classical core promoter to a framework with multiple transcriptional starting sites. CAGE appears to be a straightforward technology, however, for an unwary researcher there are several caveats that may lead to false conclusions.