ABSTRACT

Clustering is one of the most popular tasks in knowledge discovery and is applied in various domains (e.g., data mining, pattern recognition, computer vision, etc.). Clustering methods seek to organize a set of items into clusters such that items within a given cluster have a high degree of similarity, whereas items belonging to different clusters have a high degree of dissimilarity. Partitioning clustering methods ([114, 148, 175]) aim to obtain a single partition of the input data into a fixed number of clusters. Such methods often look for a partition that optimizes (usually locally) an adequacy criterion function. Clustering techniques have been widely studied across several disciplines, but only a few of the techniques developed scale to support clustering of very large time-changing data streams (Madjid & Norwati 2010 [220]) (Mahdiraji 2009 [221]). The major challenge in clustering of evolving data is to handle cluster evolution: new clusters may appear, old ones may disappear, merge or split over time.