ABSTRACT

Roughly speaking, clustering is the process of grouping objects into different groups, such that the common properties of data in each subset are high, and between different subsets are low. Clustering methods are widely used in data mining. They are either used to get insight into data distribution or as a preprocessing step for other algorithms. The most common approaches use distance between examples as similarity criteria. These approaches require space that is quadratic in the number of observations, which is prohibitive in the data stream paradigm.