High-Dimensional Biomarkers in Drug Discovery: QSTAR Framework

doi:10.1201/9781315372662-22

ABSTRACT

Luc Bijnens, Willem Talloen, Bie Verbist, Hinrich W.H. Go¨hlmann

Janssen Pharmaceutica, Belgium

Adetayo Kasim

Durham University, United Kingdom

QSTAR Consortium

16.1 Introduction: From a Single Trial to a High-Dimensional Setting 276

16.2 The QSTAR Framework and Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . . 278

16.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

16.3.1 The ROS1 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

16.3.2 The EGFR Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

16.4 Graphical Interpretation (I): Association between a Gene and

Bioactivity Accounting for the Effect of a Fingerprint Feature . 280

16.5 Modeling Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

16.5.1 The Joint Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

16.5.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

16.5.3 Graphical Interpretation (II): Adjusted Association and

Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

16.6 Analysis of the EGFR and the ROS1 Projects . . . . . . . . . . . . . . . . . . 284

16.6.1 Application to the EGFR Project . . . . . . . . . . . . . . . . . . . . . . . 284

16.6.2 Application to the ROS1 Project . . . . . . . . . . . . . . . . . . . . . . . 287

16.7 The R Package IntegratedJM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

16.7.1 Identification of Biomarkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

with

16.7.2 Analysis of One Gene Using the gls Function . . . . . . . . . . 303

16.8 The IntegratedJM Shiny App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

16.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

In contrast with the analysis presented in previous chapters, which was focused on data obtained from clinical trials, this chapter focuses on drug discovery experiments. Our aim is to find genetic biomarkers for phenotypic data for a set of compounds under development. The data for the analysis consists of (1) a m × n gene expression matrix (X) that contains gene expression measurements of m genes for n compounds, (2) a n×1 vector of phenotypic data (Y), and (3) a n × 1 vector of chemical structure (Z). Figure 16.1 illustrates the relationship between the three variables. Our goal is to model the relationship between the gene expression and the phenotypic data, taking into account that the chemical structure of the compound may (or may not) influence both variables. This modeling approach is called QSTAR, Quantitative StructureTranscription-Assay Relationship, and it is further discussed in Section 16.2. The connection between the QSTAR framework and the surrogacy framework is illustrated in Section 16.4.