ABSTRACT

This chapter discusses clustering methods which are based on the k-means or the k-medians methodology. It explores density-based methods for stream clustering. The chapter describes probabilistic algorithms for clustering data streams and shows that high dimensional streaming algorithms. It outlines methods for discrete and categorical stream clustering and considers methods for clustering text streams. In the context of stream processing, temporal locality is also quite important, because the underlying patterns in the data may evolve, and therefore, the clusters in the past history may no longer remain relevant to the future. A variety of stream clustering algorithms attempt to take such temporal issues into account with the use of snapshot-based methods, decay-based techniques and windowing. Since stream data naturally imposes a one-pass constraint on the design of the algorithms, it becomes more difficult to provide such flexibility in computing clusters over different kinds of time horizons using conventional algorithms.