ABSTRACT

Willem Talloen, Hinrich W. H. Go¨hlmann, Bie Verbist, Nolen Joy Perualila, Ziv Shkedy, Adetayo Kasim and the QSTAR Consortium

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 11.2 Drug Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

11.2.1 Historic Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 11.2.2 Current Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.2.3 Collaborative Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

11.3 Data Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.3.1 High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.3.2 Complex and Heterogeneous Data . . . . . . . . . . . . . . . . . . . . . . 166

11.3.2.1 Patient Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 166 11.3.2.2 Targeted Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 11.3.2.3 Compound Differentiation . . . . . . . . . . . . . . . . . . . . 167

11.4 Data Analysis: Exploration versus Confirmation . . . . . . . . . . . . . . . . 168 11.5 QSTAR Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

11.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 11.5.2 Typical Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 11.5.3 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

11.6 Inferences and Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

In this chapter, we focus on the implications of high-dimensional biology in drug discovery and early development data analysis. We first provide the historical context of data-driven biomedical research and the current context of data dimensionalities that continue to grow exponentially. In the next section, we illustrate that these datasets are not only big but also often heterogeneous

and complex. Enabling the identification of relatively small interesting subparts therefore requires specific analysis approaches. As we will show, biclustering approaches can add value in many explorative phases of drug development to a level that is currently still underappreciated. We conclude with some reasons for this underappreciation and some potential solutions to bring biclustering more into the spotlight of exploratory research of big biomedical data.