ABSTRACT

High-throughput RNA sequencing projects after the turn of the century revealed the existence of large numbers of low abundance long, often multi-exonic, RNAs in animals and plants that have no protein-coding potential, termed ‘lncRNAs’ (long non-coding RNAs), expressed intronically, ‘intergenically’ and antisense to protein-coding genes, as well as from thousands of pseudogenes and the 3′UTRs of mRNAs. The data also showed that most of the genome in eukaryotes is transcribed in highly complex interlacing and overlapping patterns, substantially from both DNA strands, challenging the conception of genes as discrete entities. Although initially suspected to be noise, lncRNAs were found to be dynamically expressed during differentiation and development, mostly in highly cell type-specific patterns - far more so than protein-coding RNAs - and to be associated with subnuclear and cytoplasmic organelles, chromatin-modifying proteins and/or chromatin domains. The genetic signatures of sequence variation in lncRNAs are subtler than those of protein-coding genes, but many have been found to be involved in the etiology of cancer and developmental, autoimmune, neurodegenerative and neuropsychiatric disorders. Large numbers of lncRNAs have been shown to have biological functions in, for example, DNA damage repair, cell fate determination and reprogramming, mesoderm and endoderm differentiation, retinal, skeletal, muscle and brain development, memory and behavior, neuronal differentiation, hematopoietic and immunological differentiation, inflammation and hormone production, among many others.