chapter  20
26 Pages

Semi-Supervised Clustering

ByAnil Jain, Rong Jin, Radha Chitta

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

20.1.1 Acquisition and Expression of Side-Information . . . . . . . . . . . . . . . . . . 446 20.1.2 Incorporation of Side-Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

20.1.2.1 Solution Space Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 20.1.2.2 Distance Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

20.2 Semi-Supervised Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 20.2.1 Semi-Supervised Kernel K -Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 20.2.2 BoostCluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 20.2.3 Active Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

20.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 20.4 Conclusions and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464

Clustering is an unsupervised learning problem, whose objective is to find a partition of the given data. However, a major challenge in clustering is to define an appropriate objective function in order to find an optimal partition that is useful to the user. To facilitate data clustering, it has been suggested that the user provide some supplementary information about the data (e.g., pairwise relationships between few data points), which, when incorporated in the clustering process, could lead to a better data partition. Semi-supervised clustering algorithms attempt to improve clustering performance by utilizing this supplementary information. In this chapter, we present an overview of semi-supervised clustering techniques and describe some prominent algorithms in the literature. We also present several applications of semi-supervised clustering.