ABSTRACT

Descriptor selection is one of the most attractive approaches in many fields of science and engineering and is also the inevitable part of any quantitative structure-activity relationship (QSAR) study. The more the number of generated descriptors, the higher the computational cost of selecting few of them to be incorporated in the final models. Managing the molecular descriptors is a critical stage in the modeling procedure. Wherever seemed appropriate, the rescaling of the generated parameters may be performed before descriptor selection step in QSAR analyses. Two strategies that are employed for the feature selection are based on Wrapper and Filter methods. Generally speaking, Wrapper methods select a subset of descriptors based on optimizing a fitness function that is sometimes regarded as objective function with a linear or nonlinear nature. In contrast to Wrapper methods that take the advantage of using linear or nonlinear regression models, filtering methods eliminate the insignificant descriptors based on statistical parameters such as low-variance and pairwise parameter correlations.