CONTENTS 28.1 Introduction 617 28.2 Pattern Recognition Approaches 620
28.2.1 Pattern Recognition Algorithms 621 28.2.2 Semi-Supervised Learning 622 28.2.3 Unsupervised Learning 623 28.2.4 Time-Series Representations and Symbolic Aggregate Approximation 625 28.2.5 Similarity Measure 628
18.104.22.168 Shape-Based Similarity 629 22.214.171.124 Experimental Results 632 126.96.36.199 Structural Similarity 633 188.8.131.52 Bag-of-Patterns (BOP) Representation 635
28.3 Astronomical Applications: Current and Future 637 28.4 Summary 639 References 639
28.1 INTRODUCTION Perhaps the most commonly encountered data types are time series, touching almost every aspect of human life, including astronomy. One obvious problem of handling time-series databases concerns with its typically massive size-gigabytes or even terabytes are common, with more and more databases reaching the petabyte scale. For example, in telecommunication, large companies like AT&T produce several hundred millions long-distance records per day [Cort00]. In astronomy, time-domain surveys are relatively new-these are surveys that cover a significant fraction of the sky with many repeat observations, thereby producing time series for millions or billions of objects. Several such time-domain sky surveys are
now completed, such as the MACHO [Alco01], OGLE [Szym05], SDSS Stripe 82 [Bram08], SuperMACHO [Garg08], and Berkeley’s Transients Classification Pipeline (TCP) [Star08] projects. The Pan-STARRS project is an active sky survey-it began in 2010, a 3-year survey covering three-fourths of the sky with ∼60 observations of each field [Kais04]. The Large Synoptic Survey Telescope (LSST) project proposes to survey 50% of the visible sky repeatedly approximately 1000 times over a 10-year period, creating a 100-petabyte image archive and a 20-petabyte science database (https://www.lsst.org/). The LSST science database will include time series of over 100 scientific parameters for each of approximately 50 billion astronomical sources-this will be the largest data collection (and certainly the largest timeseries database) ever assembled in astronomy, and it rivals any other discipline’s massive data collections for sheer size and complexity.