ABSTRACT

CONTENTS 28.1 Introduction 617 28.2 Pattern Recognition Approaches 620

28.2.1 Pattern Recognition Algorithms 621 28.2.2 Semi-Supervised Learning 622 28.2.3 Unsupervised Learning 623 28.2.4 Time-Series Representations and Symbolic Aggregate Approximation 625 28.2.5 Similarity Measure 628

28.2.5.1 Shape-Based Similarity 629 28.2.5.2 Experimental Results 632 28.2.5.3 Structural Similarity 633 28.2.5.4 Bag-of-Patterns (BOP) Representation 635

28.3 Astronomical Applications: Current and Future 637 28.4 Summary 639 References 639

28.1 INTRODUCTION Perhaps the most commonly encountered data types are time series, touching almost every aspect of human life, including astronomy. One obvious problem of handling time-series databases concerns with its typically massive size-gigabytes or even terabytes are common, with more and more databases reaching the petabyte scale. For example, in telecommunication, large companies like AT&T produce several hundred millions long-distance records per day [Cort00]. In astronomy, time-domain surveys are relatively new-these are surveys that cover a significant fraction of the sky with many repeat observations, thereby producing time series for millions or billions of objects. Several such time-domain sky surveys are

now completed, such as the MACHO [Alco01], OGLE [Szym05], SDSS Stripe 82 [Bram08], SuperMACHO [Garg08], and Berkeley’s Transients Classification Pipeline (TCP) [Star08] projects. The Pan-STARRS project is an active sky survey-it began in 2010, a 3-year survey covering three-fourths of the sky with ∼60 observations of each field [Kais04]. The Large Synoptic Survey Telescope (LSST) project proposes to survey 50% of the visible sky repeatedly approximately 1000 times over a 10-year period, creating a 100-petabyte image archive and a 20-petabyte science database (https://www.lsst.org/). The LSST science database will include time series of over 100 scientific parameters for each of approximately 50 billion astronomical sources-this will be the largest data collection (and certainly the largest timeseries database) ever assembled in astronomy, and it rivals any other discipline’s massive data collections for sheer size and complexity.