ABSTRACT

MicroRNAs (miRNAs) are short (18–23 nt), non-coding RNAs that play central roles in cellular regulation by modulating the post-transcriptional expression of messenger RNA (mRNA) transcripts. It has been previously estimated that 60–90% of all mammalian mRNAs may be targeted by miRNAs. Due to their biological importance, the ability to accurately predict miRNA sequences is of great importance. Computational prediction of miRNAs is either genomic sequence-based (de novo) or through analyzed transcriptomic data arising from next-generation sequencing (NGS) experiments. Here we review the state of the art in both forms of miRNA prediction. Our analysis concludes that existing methods of de novo miRNA prediction often fail when applied to non-model species and are not well suited to genome-scale data sets. Furthermore, there exists an opportunity to create new methods that incorporate all known lines of evidence for miRNA prediction, rather than strictly focusing on either sequence-based or expression-based features. Lastly, considering the increasing data sizes arising from rapidly advancing NGS instrumentation, miRNA prediction is ultimately a “big data” problem. Parallel computing and deep networks are discussed as ways to address and leverage this situation, respectively.