ABSTRACT

Partitional clustering algorithms are quite popular in both bioinformatics and drug discovery. Variations of K-means continue to spring up in the literature, largely in an effort to tackle larger and larger data sets efficiently. JarvisPatrick was a popular method for clustering large scale data sets of binary fingerprints early on in cheminformatics, but has been supplanted largely by K-means-like and sampling algorithms. Spectral clustering, popular in the imaging community, has had some adherents for medium sized data sets in cheminformatics. Self-organizing maps of various forms have become popular in bioinformatics, though originally coming from the engineering literature. They have an analog computing flavor with numerous parameters that vary throughout the clustering process, but it can be shown that they are really quite similar to K-means in their overall behavior. Their popularity is in part due to the fact that they provide a convenient way in which to visualize the results as a part of the process and output.