ABSTRACT

Expanding from the initial findings that individual RNAs can act as guides, allosteric regulators, scaffolds or decoys, RNA is being shown to have fundamental and highly versatile roles in all aspects of gene regulation in plants and animals, frequently in conjunction with repetitive sequences. RNA regulates chromosome structure through interaction with transposon-derived elements (TEs). Chromatin-remodeling proteins, sometimes referred to as ‘pioneer’ transcription factors, have little or no sequence specificity but bind RNA and address different loci in different cells at different developmental stages. The largest class of sequence-specific transcription factors, containing zinc-finger motifs, also address target loci differentially, bind RNA as well as DNA, and many have higher affinity for RNA-DNA hybrids than for double-stranded DNA. RNA-DNA hybrids and RNA-DNA-DNA triplexes are common in eukaryotic chromatin. Half of the C2H2 zinc-finger proteins in the human genome contain KRAB domains, many primate-specific, which bind TEs. DNA methylation has been known for decades to be RNA-guided. Histone-modifying proteins also have no intrinsic sequence specificity but bind RNA ‘promiscuously’, i.e., bind to many different RNAs. Enhancers express RNAs in the cells in which they are active, and enhancer RNAs are required for enhancer action, which involves chromatin ‘looping’ to form transcriptional hubs. Enhancers have all the signatures of genes, except that they do not encode proteins, and the number of enhancers is approximately the same as the number of lncRNAs expressed from the human genome, which resolves the G-value enigma. Most proteins involved in regulating gene expression in plants and animals, including Hox proteins, transcription factors and histone modifiers, contain ‘intrinsically disordered regions’ (IDRs), the fraction of which increases with developmental complexity. IDRs interact with RNAs to form biomolecular condensates, such as nucleoli and paraspeckles, which are deployed widely to organize specialist subnuclear and cytoplasmic domains, and likely topologically associated domains in chromatin. RNA-IDR interactions may comprise the third dimension of the ancestral protocell. LncRNAs have a modular and highly alternatively spliced structure, with many domains derived from TEs. LncRNAs appear to act as both scaffolds and guides for ribonucleoprotein complexes, a highly efficient and flexible system that, like RNAi and CRISPR, uses RNA signals to direct generic protein effectors to their sites of action, to program development and underpin adaptive radiation.