ABSTRACT

The increasing growth of data in the era of information technology requires accelerated data availability for conducting computational and statistical processes to achieve useful information through data mining approaches. Quantitative structure-activity relationship (QSAR) is a chemoinformatic technique, which involves data mining and has proved to be helpful in accelerating the process of drug design and discovery. QSAR is defined as a mathematical equation that correlates the biological activities of the compounds to their structural features. Every QSAR analysis needs to follow a workflow consisting of several computational and statistical steps. Briefly, these steps comprise data collection and preparation, calculation and preprocessing of molecular parameters, data sets generation (train and test sets), descriptor selection, model building, internal and external validation, and model development. There are several important rules for developing a QSAR model capable of predicting activity/property/toxicity of a new compound falling in the applicability domain of the model.