ABSTRACT

Data to Explain the Mechanism of Action . . . . . . . . . . . . . . 176 12.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

12.2.1 CMap Gene Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Preprocessing Raw Gene Expression Data . . . . . . . . . . . . . . . . . . . . . . 178 12.2.2 Target Prediction Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Target Prediction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Predicted Protein Binding Probability Scores . . . . . . . . . . . . . . . . . . . 179 Data Binarization: Target Prediction Scores . . . . . . . . . . . . . . . . . . . . . 179

12.3 Integrative Data Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 12.3.1 Clustering of Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Similarity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Target Prediction-Based Clustering of Compounds . . . . . . . . . . . . . 181 12.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 12.3.3 Pathway Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Overlapping Pathway Search Using KEGG and GO Databases 184 Gene Set Analysis Using Mean Log p-value (MLP) Analysis . . . 185

12.4 Biclustering with FABIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 12.5 Data Analysis Using the R Package IntClust . . . . . . . . . . . . . . . . . . 187

12.5.1 Step 1: Calculation of Similarity Scores . . . . . . . . . . . . . . . . . 189 12.5.2 Step 2: Target Prediction-Based Clustering . . . . . . . . . . . . . 189 12.5.3 Step 3: Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

12.5.4 Biclustering Using fabia() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 12.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

In this chapter, we discuss two approaches for the analysis of multisource drug discovery data in order to gain insights into the compounds mechanismof-action (MoA). The analysis is done within the QSTAR setting that was presented in Chapter 1. The first approach is based on a two-step integrative analysis and the second is a biclustering analysis based on FABIA. In contrast to biclustering methods that find a subset of genes with similar expression profiles across a subset of compounds, the first approach first finds subsets of compounds that share similar predicted protein targets (via clustering) and then link them to a subset of genes by testing differential expression. The first approach is discussed in Ravindranath et al. (2015).