ABSTRACT

Transcriptomic analysis deals with the questions of which parts of the genome are transcribed and how actively they are transcribed. In the past, these questions were mostly answered with microarray, which is based on hybridization of RNA samples to DNA probes that are specific to individual gene-coding regions. With this hybridization-based approach, the repertoire of hybridization probes, which are designed based on the current annotation of the genome, determines what genes in the genome or which parts of the genome are analyzed, and genomic regions that have no probe coverage are invisible. A next-generation sequencing (NGS)-based approach, on the other hand, does not depend on the current annotation of the genome. Because it relies on sequencing of the entire RNA population, hence the term RNASeq, this approach makes no assumption as to which parts of the genome are transcribed. After sequencing, the generated reads are mapped to the reference genome in order to search for their origin in the genome. The total number of reads mapped to a particular genomic region represents the level of transcriptional activity at the region. The more transcriptionally active a genomic region is, the more copies of RNA transcripts it produces, and the more reads it will generate. RNA-Seq data analysis is essentially based on counting reads generated from different regions of the genome.