ABSTRACT

CONTENTS 25.1 Introduction 543 25.2 k-Means Clustering 545

25.2.1 Compression via Clustering 547 25.2.2 Kernel k-Means 548 25.2.3 k-Means Limitations 548

25.3 EM Clustering 549 25.4 Hierarchical Agglomerative Clustering 552 25.5 Spectral Clustering 553 25.6 Constrained Clustering 555 25.7 Applications to Astronomy Data 556 25.8 Summary 557 25.9 Glossary 558 Acknowledgments 559 References 559

25.1 INTRODUCTION On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to “fuse” observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon.