ABSTRACT

One of the principal objectives of molecular diversity analysis is to devise computational methods that ensure coverage of the largest possible expanse of chemical space in the search for bioactive molecules. The concept of diversity is normally quantified by using techniques derived from those developed for similarity searching in chemical databases, which involves measuring the degree of structural similarity (or dissimilarity) between two molecules by a comparison of the sets of descriptors that characterize those molecules (I). There has thus been much interest in measures of structural similarity (including both the descriptors that are employed to characterize molecules, and the coefficients that are employed to quantify the degree of resemblance between two molecules' sets of associated descriptors) and in ways in which such measures can be used in diversity analyses (2,3), in particular in methods for selecting compounds to maximize their structural diversity (4).