ABSTRACT

Cheminformatics is an interdisciplinary science that exists at the interface between chemistry and computer and information sciences. Its goal is to design new molecules that meet societal needs. Among the many fields in which it is used, the design of new drugs (medicines) is an area that has seen the greatest application of cheminformatics. The goal of drug discovery is to find the optimal molecule that binds to a biological target, typically a protein. The number of theoretical molecules (known as chemical space) from which to find the optimal molecule is infinite. This chemical space can be reduced to a finite druglike chemical space (estimated to contain between 1012 and 10180 molecules) by eliminating molecules that are unlikely to be usable as drugs. The Chemical Abstracts Service (CAS) whose objective is “to find, collect, and organize all publicly disclosed substance information” currently (as of December 29, 2015) contains only approximately 105 million molecules, a small fraction of the druglike chemical space. In practice, the quest for a new molecule starts from lists of existing molecules. Cheminformatics techniques are used to filter these lists to generate a subset of molecules that are tested experimentally against the biological target using high-throughput screening. Molecules that bind to the target are said to be hits. From the list of hits, the filtering process identifies leads and from these, a candidate that enters preclinical development. Perhaps because of the strong connection with medicinal chemistry, cheminformatics is sometimes seen as being related to bioinformatics. However, a key distinction in the underlying algorithmic techniques is that bioinformatics focuses on sequence data (e.g., DNA sequences), whereas cheminformatics focuses on the structure of small molecules represented as graphs. Excellent introductions and surveys of the field include References 1–4.