Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms.

Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered.

This book is divided into three parts--

  • Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns

  • A C++ Data Clustering Framework: The development of data clustering base classes

  • Data Clustering Algorithms: The implementation of several popular data clustering algorithms

A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the downloadable resources. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.

part I|2 pages

Data Clustering and C++ Preliminaries

chapter 1|26 pages

Introduction to Data Clustering

chapter 2|12 pages

The Unified Modeling Language

chapter 3|16 pages

Object-Oriented Programming and C++

chapter 4|20 pages

Design Patterns

chapter 5|24 pages

C++ Libraries and Tools

part II|2 pages

A C++ Data Clustering Framework

chapter 6|12 pages

The Clustering Library

chapter 7|16 pages


chapter 8|8 pages


chapter 9|10 pages

Dissimilarity Measures

chapter 10|12 pages

Clustering Algorithms

chapter 11|22 pages

Utility Classes

part III|2 pages

Data Clustering Algorithms

chapter 12|32 pages

Agglomerative Hierarchical Algorithms

chapter 13|12 pages


chapter 14|12 pages

The k-means Algorithm

chapter 15|14 pages

The c-means Algorithm

chapter 16|10 pages

The k-prototypes Algorithm

chapter 17|14 pages

The Genetic k-modes Algorithm

chapter 18|12 pages

The FSC Algorithm

chapter 19|16 pages

The Gaussian Mixture Algorithm

chapter 20|16 pages

A Parallel k-means Algorithm

chapter |2 pages

A Exercises and Projects

chapter |136 pages

B Listings

chapter C|8 pages