ABSTRACT

Nowadays, we are in the presence of sources of data produced continuously at high speed. Examples include TCP/IP traffic, GPS data, mobile calls, emails, sensor networks, customer click streams, etc. These data sources continuously generate huge amounts of data from nonstationary distributions. Storage, maintenance, and querying data streams brought new challenges in the database and data mining communities. The database community has developed Data Stream Management Systems (DSMS) for continuous querying, compact data structures (sketches and summaries), and sub-linear algorithms for massive dataset analysis. In this chapter, we discuss relevant issues and illustrative techniques developed in stream processing that might be relevant for data stream mining.