ABSTRACT

This chapter discusses the basics of sequence data and then presents three major classes of sequence features: traditional pattern-based sequence features, general pattern-based features, and sequence features that are not defined by patterns. It also discusses several ways that sequence patterns can be used as sequence features. The chapter also presents an overview of sequence pattern types and then give brief discussions on how to mine them. It considers factors that are important for selecting patterns as features. The chapter focuses on feature engineering for symbolic sequences. Among various kinds of sequence data types, the symbolic sequences are the most basic. Many other sequence types can be converted into symbolic sequences using preprocessing. The supermarket customer example of people who finish by paying at the checkout, then taking goods they bought to their car, and then returning the shopping cart to a designated place is meant to be an example of frequent sequence patterns.