ABSTRACT

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

Jiawei Han

Department of Computer Science, University of Illinois at Urbana-Champaign

Xifeng Yan

Department of Computer Science, University of California at Santa Barbara

Philip S. Yu

Department of Computer Science, University of Illinois at Chicago

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 DDPMine: Direct Discriminative Pattern Mining . . . . . . . . . . . . . . . 42

5.2.1 Branch-and-Bound Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.2 Training Instance Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2.2.1 Progressively Shrinking FP-Tree . . . . . . . . . . . 46 5.2.2.2 Feature Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.3 Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3 Harmony: Efficiently Mining The Best Rules For Classification 49 5.3.1 Rule Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3.2 Ordering of the Local Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.3 Search Space Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Performance Comparison Between DDPMine and Harmony . . . . 55 5.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5.1 MbT: Direct Mining Discriminative Patterns via Model-based Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5.2 NDPMine: Direct Mining Discriminative Numerical Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5.3 uHarmony: Mining Discriminative Patterns from Uncertain Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5.4 Applications of Discriminative Pattern Based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5.5 Discriminative Frequent Pattern Based Classification vs. Traditional Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

This chapter considers the efficient direct mining of discriminative patterns. Here, by discriminative patterns we mean those patterns that can be used as features by some classification algorithms, to produce highly accurate classifiers. The direct mining approach pushes the constraints implied by “to serve as feature set” for classification into the mining process. As will be discussed below, direct mining algorithms can significantly outperform two-step algorithms (which first mine patterns and then select a subset of the mined patterns as discriminative patterns). More specifically, this chapter will discuss two direct mining algorithms, in addition to giving an overview of some recent results in this direction.