ABSTRACT

Transcript sampling approaches independent of a set of predefined assay probes, such as Massively Parallel Signal Sequencing, Serial Analysis of Gene Expression or Expressed Sequence Tag sequencing, are powerful in that they have breadth of transcript coverage and provide insight into novel transcript expression events. Discoveries that have resulted from the study of ribonucleic acids (RNA) that has been expressed from the genome — transcriptomics — have been dependent upon the availability of technologies that can consistently capture a representation of expressed RNA. Given that all Cap Analysis of Gene Expression libraries are neither normalized or subtracted, integration across different libraries to obtain a higher- level representation of transcription start site utilization and expression strength is straightforward provided all libraries are fully annotated with a hierarchical vocabulary or ontology. The integration of expression information across heterogeneous data sources comes with a number of caveats.