ABSTRACT

The goal of unsupervised learning is to identify or simplify data structure. Unsupervised learning is of growing importance in a number of fields; examples are seen when grouping breast cancer patients by their genetic markers, shoppers by their browsing and purchase histories, and movie viewers by the ratings assigned by movie viewers. Unsupervised learning problems can be further divided into clustering, association, and anomaly detection. Principal component analysis is an important and useful unsupervised learning tool for dimension reduction in drug design and discovery. Anticancer activity patterns of 112 ellipticine analogs were analyzed using a hierarchical clustering algorithm. A dramatic coherence between molecular structures and their activity patterns was discovered from the cluster tree. The global clustering coefficient can be used as a metric for measuring network modularity. Unsupervised learning is a critical foundation of supervised learning. Any application of supervised learning must involve some sort of unsupervised learning.