ABSTRACT

This prompted various authors to propose an extension of QSAR models to database mining. It was also emphasized [76] that such applications, proposing potentially promising chemical structures rather than delivering good statistical models, best come up to the expectations of medicinal chemists. However, as remarked by Hillebrecht and Klebe [702], all models (except the k-nearest neighbors [514]) must be set up on a training set, and their predictive ability is to be limited to a structural space with no more than a reasonable structural extrapolation (see, e.g., Ref. [703]). So, QSAR models cannot be applied to data mining in huge databases of wide structural diversity. They seem more suited to the screening of focused databases or to reduced sets of compounds after initial ltering. But it may be noted that for such uses, the prediction accuracy is not crucial since the problem is only to categorize chemicals into a few classes (highly, medium, weakly active, inactive). Rather than the classical correlation coefcient (generally used to examine the quality of the QSAR), Spearman’s rank correlation coefcient better characterizes the ability of a model to rank compounds according to their activity.