Chemoinformatics: Structure- and Property-activity Relationship Devel- opment

doi:10.1201/b16023-26

ABSTRACT

Department of Psychiatry and Psychotherapy, University Hospital of Erlangen-Nuremberg Erlangen, Germany

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 19.2 Example Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 19.3 Importing the Example Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 19.4 Preprocessing of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 19.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 19.6 Model Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 19.7 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 19.8 Y-Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 19.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 19.10 Conclusion/Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

The process of drug design from the biological target to the drug candidate, and subsequently the approved drug has become increasingly expensive. Therefore, strategies and tools that reduce costs have been investigated to improve the effectiveness of drug design [1] [2]. Among the time-consuming and cost-intensive steps are the selection, synthesis and experimental testing of drug candidates. Numerous attempts have been made to reduce the number of potential drug candidates for experimental testing. Several methods that rank compounds with respect to their likelihood to act as an active drug have been developed and applied with variable success [3]. In silico methods that support the drug design process by reducing the number of promising drug candidates are collectively known as virtual screening methods [4]. If the structure of the target protein has been discovered and is available, this information can be used for the screening process. Docking is an appropriate method for structure-based virtual screening [5][6][7]. Ligand-based virtual screening employs several known active ligands to rank other drug candidates using physiochemical property pre-filtering [8][9], pharmacophore screening [10][11], and similarity search [12] methods. All methods share the common goal to reduce the number of drug candidates subjected

Applications

to biological testing and increase the efficacy of the drug design process [13][14]. In this chapter, we want to demonstrate an in silico method to predict biological activity based on RapidMiner workflows. This chapter is based on our previous experience using RapidMiner for chemoinformatic predictions [15][16][17]. In a companion chapter, we demonstrate how to process the chemoinformatic descriptors generated in PaDEL.