ABSTRACT

Chemoinformatics is the representation and manipulation of chemical information that describes molecular fragments, molecules, and compound libraries. The correlation of chemoinformatic data to biological response data-such as activity toward a target receptor, molecular or cellular toxicity, and the pharmaceutically relevant processes of absorption, distribution, metabolism, and excretion 0-8493-0815-l/04/$0.00+$1.50

(collectively called ADME properties)—can be achieved via data-mining techniques. Data-mining techniques assume that a pattern exists between a population of molecules, their molecular properties, and their biological behavior. Data-mining algorithms use different strategies to uncover or “mine” these patterns. High-throughput methods geared toward pharmaceutical application and therapeutic-target research, such as combinatorial chemistry (CC) and high-throughput screening (HTS), produce large populations of molecules. The associated data overwhelm traditional quantitative structure activity relationship (QSAR) and computational modeling techniques. Thus, high-throughput computational modeling techniques, such as data mining, have become a necessary and natural complement to the present high-throughput combinatorial age. Prioritization of tractable chemical libraries from a large virtual chemical space of possible synthesizable compounds is a necessary and key step in using high-throughput methods to search for biological activity of a given receptor target. Prioritization of biological screening of existing-compound libraries can be based on computational model scores of similarity to known active compounds or predictions of activity, selectivity, and other relevant properties. Strategic application of data-mining techniques aims to focus high-throughput synthesis and screening resources efficiently on biologically relevant or enriched libraries at the “virtual” stage, thus enabling a higher return on efforts.