ABSTRACT
Rezarta Islamaj Dogan University of Maryland at College Park and National Center for Biotechnology Information
Lise Getoor
University of Maryland at College Park
W. John Wilbur
National Center for Biotechnology Information
Many real-world data mining problems involve data best represented as sequences. Sequence data comes in many forms, including: 1) human communication such as speech, handwriting, and printed text; 2) time series such as stock market prices, temperature readings and web-click streams; and 3) biological sequences such as DNA, RNA and proteins. Sequence data in all domains contains useful “signals,” or features, that enable the construction of classification algorithms.