ABSTRACT

Abstract Molecular processes underlying cellular behavior may be comprehended through the analysis of gene expression profiles. This approach is complex and encompasses the identification of which genes are expressed at any given time and how their products interact in so called gene regulatory networks (GRN). High-throughput technologies, such as DNA micro arrays and next generation sequencing are the technologies of choice to quantify gene expression that will be used to model GRN. Mathematical models aim to infer the structure of GRN, possibly identifying which genes relate to which other genes. Among such models, Granger causality allows for the identification of directionality at the edges of GRN through the analysis of time series gene expression data. The intuitive concept underlying Granger causality is the idea that an effect never occurs before its cause. This concept was introduced by Norbert Wiener in 1956 but it was Clive Granger who proposed a statistical method to identify Granger causality between two time series in 1969. In 1982 John Geweke generalized Granger’s idea to a multivariate form, a more interesting methodology for dealing with biological data sets generated by high-throughput technologies. In this chapter we review Granger causality concepts and we describe recently obtained results using a generalization of the multivariate Granger causality to identify Granger causality between gene clusters. Detailed descriptions of the concept, algorithms to identify and statistically test Granger causality between sets of time series are described.