ABSTRACT

High-dimensional regression problems with multiple outcomes are increasingly common in real-world settings. For example, biomedical studies can involve from a handful to over twenty thousand responses, such as gene expression levels. We will focus on context-driven Bayesian variable selection approaches that can leverage complex dependence structures across the outcomes, while coherently conveying uncertainty. In particular, we will discuss the relative merits of diverse sparse priors in the context of hierarchically-related responses, and consider adaptations when the number of responses is large. We will also touch on aspects of accuracy and scalability of inference for such models, as well as algorithmic enhancements to best handle posterior multimodality. Genetic association problems involving molecular phenotypes – in particular on the regulation of gene expression – will provide us with a series of compelling case studies to illustrate the advantages of multiple-response hierarchical models tailored to large and complex datasets. We will conclude by highlighting some open statistical and computational challenges.