ABSTRACT

Rapid development of combinatorial chemistry and high throughput screening techniques in recent years has provided a powerful alternative to traditional approaches to lead generation and optimization. In traditional medicinal chemistry, these processes frequently involve the purification and identification of bioactive ingredients of natural, marine, or fermentation products or random screening of synthetic compounds. This is often followed by a series of painstaking chemical modification or total synthesis of promising lead compounds, which are tested in adequate bioassays. On the contrary, combinatorial chemistry involves systematic assembly of a set of "building blocks" to generate a large library of chemically different molecules that are screened simultaneously in various bioassays ( 1,2). In the case of targeted library design, the lead identification and optimization task then becomes that of generating libraries with structurally diverse compounds that are similar to a lead compound; the underlying assumption is that structurally similar compounds should exhibit similar biological activities. Conversely, structurally dissimilar compounds should exhibit very diverse biological activity profiles; thus the goal of the diverse library design is

In many practical cases, the exhaustive synthesis and evaluation of combinatorial libraries is prohibitively expensive, time-consuming, or redundant (4). Theoretical analysis of available experimental information about the biological target or pharmacological compounds capable of interacting with the target can significantly enhance the rational design of targeted chemical libraries. In many cases, the number of compounds with known biological activity is large enough to develop viable quantitative structure-analysis relationship (QSAR) models for such data set. These models can be used as a means of selecting virtual library compounds (or actual compounds from existing databases) with (high) predicted biological activity. Alternatively, if a variable selection method has been employed in developing a QSAR model, the use of only selected variables can improve the performance of the rational library design or database mining methods based on the similarity to a probe. This procedure of using only selected variables in similarity searches in the descriptor space is analogous to the more traditional use of conventional chemical pharmacophores in database mining.