ABSTRACT

Transcriptome assembly is different from genome assembly. In genome assembly, the read coverage is usually more uniform (excluding biases depending on the library preparation and sequencing technology). Deviation from uniform sequence depth in genome assembly indicates the presence of repeats. In contrast, with RNA-seq data, the abundance of gene expression can vary several magnitudes between genes and also different isoforms of the same gene can be expressed at different levels. Although this can actually be utilized in transcript assembly in detecting and constructing different isoforms, highly different abundances between the genes also introduce challenges. It requires more sequencing depth to represent less abundant genes and rare events. In order to balance abundance differences between the genes, there are wet laboratory procedures for library normalization. Description of such methods is beyond the scope of this book, but it is good to keep in mind that the quality of assembly consists of the combination of data and computational methods. Since sequencing technology only converts the content of an RNA-seq library into a digital form, library preparation is a key element in obtaining good quality data. Garbage in-garbage out applies to both sequencing and assembly. Quality control of data should be done before any assembly.