Predictive Quantitative Structure–Activity Relationships Modeling: Data Preparation and the General Modeling Workflow

doi:10.1201/9781420082999-10

Chapter

Predictive Quantitative Structure–Activity Relationships Modeling: Data Preparation and the General Modeling Workflow

ABSTRACT

Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .197 6.9 Model Validation: Modeling, Training, Test, and External

Evaluation Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .199 6.10 Division of a Modeling Set Into Training

and Test Sets. External Evaluation Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200 6.11 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .204 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .205

In this and the next chapter, we shall consider modern approaches for developing statistically robust and externally predictive quantitative structure-activity relationships (QSAR) models. We shall discuss the general QSAR model development and validation workflow that should be followed irrespective of specifics of any particular QSAR modeling routine. We will refrain on purpose from discussing any specific model optimization algorithms because such details could be found in many original publications. This chapter focuses on the initial steps in QSAR modeling, that is, input data preparation and curation, as well as introduces the general workflow for developing validated and predictive models. Conversely, the next chapter addresses

general data modeling and model validation procedures that constitute the important elements of the workflow.