chapter  6
Grid-Based Clustering
ByWei Cheng, Wei Wang, Sandra Batista
Pages 21

Grid-based clustering algorithms are efficient in mining large multidimensional data sets. These algorithms partition the data space into a finite number of cells to form a grid structure and then form clusters from the cells in the grid structure. Clusters correspond to regions that are more dense in data points than their surroundings. Grids were initially proposed by Warnekar and Krishna [30] to organize the feature space, e.g., in GRIDCLUS [25], and increased in popularity after STING [28], CLIQUE [1], and WaveCluster [27] were introduced. The great advantage of grid-based clustering is a significant reduction in time complexity, especially for very large data sets. Rather than clustering the data points directly, grid-based approaches cluster the neighborhood surrounding the data points represented by cells. In most applications since the number of cells is significantly smaller than the number of data points, the performance of grid-based approaches is significantly improved. Grid-based clustering algorithms typically involve the following five steps [9, 10]:

Creating the grid structure, i.e., partitioning the data space into a finite number of cells.

Calculating the cell density for each cell.

Sorting of the cells according to their densities.

Identifying cluster centers.

Traversal of neighbor cells.