ABSTRACT

Synopsis data structures are substantially smaller than their base data sets. These are data structures for supporting queries to massive data sets while minimizing or avoiding disk accesses. Since synopsis data structures are too small to maintain a full characterization of their base data sets, they must summarize the data set, and the responses they provide to queries will typically be approximate ones. Sampling methods are among the simplest methods for synopsis construction in data streams in that they use the same multi-dimensional representation as the original data points. Sketch-based methods derive their inspiration from wavelet techniques. Query estimation is possibly the most widely used application of synopsis structures. Sketching, significant research has focused on developing compact data structures. The chapter describes the fingerprint synopsis data structure that maintains a small data footprint while representing it accurately. Wavelets are used to capture crucial information about data such as broad trends and local characteristics of higher and lower order coefficients.