ABSTRACT

The use of machine learning techniques in in-silico assessment of biological activity has gained paramount importance in the modern era of the drug discovery process. A diverse range of statistical techniques is available in the literature for performing both regression and classification analysis. Depending on the methodology used, either quantitative (regression) or classification prediction is feasible for even vast and diverse chemical data sets. Though correlation models far outnumber classification models in the area of computer-aided drug designing but the significance of classification models for the development of potential therapeutic agents can’t be underestimated. With the aid of classification techniques, the chemical space can be partitioned into regions of higher and lower “fitness,” i.e., each chemical molecule gets attributed to a predefined class. This enables the identification of a “fitness landscape” having the potential for obtaining desired outcomes. The basic methodology and applications of three classification techniques, i.e., decision tree (DT), random forest (RF), and moving average analysis (MAA) have been briefly reviewed in this work.