ABSTRACT

Abstract With the rapid decrease of costs, the high-throughput gene expression data are accumulating at exponential rate in larger public repositories. Nevertheless, usually only very few replicates are available in each experiments, which make differential gene expression detection suffer from low sample size. On the other hand, multiple similar studies conducted by different groups are accessible now. The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single dataset. Separately analyzing each study may fail to detect some key genes showing low fold changes consistently in all studies. Rather, jointly modeling all data allows one to borrow information across studies to improve statistical inference. However, the simple concordance model, which assumes that differential expression occurs in either all studies or none of the studies, fails to capture studyspecific differentially expressed genes. In contrast, a model that naively enumerates and analyzes all possible differential patterns across studies can deal with studyspecificity and allow information pooling, but the complexity of its parameter space grows exponentially as the number of studies increases.