ABSTRACT

Rezarta Islamaj Dogan University of Maryland at College Park and National Center for Biotechnology Information

Lise Getoor

University of Maryland at College Park

W. John Wilbur

National Center for Biotechnology Information

Many real-world data mining problems involve data best represented as sequences. Sequence data comes in many forms, including: 1) human communication such as speech, handwriting, and printed text; 2) time series such as stock market prices, temperature readings and web-click streams; and 3) biological sequences such as DNA, RNA and proteins. Sequence data in all domains contains useful “signals,” or features, that enable the construction of classification algorithms.