ABSTRACT

Simple rules are preferable to non-linear distance or kernel functions for classifying gene expression profiles or other types of medical data. This is because rules help us understand more about the application, in addition to performing an accurate classification. In this chapter, we use emerging pattern (EP) mining algorithms to discover some novel rules that describe the

gene expression profiles of more than six subtypes of childhood acute lymphoblastic leukemia (ALL) patients. We also describe an EP-based classifier, named PCL, to make effective use of these rules for the subtype classification of leukemia patients. PCL is very accurate in this application, handling multiple parallel classifications as well. This method is evaluated on 327 heterogeneous ALL samples. Its test error rate is competitive to that of support vector machines. It is 71% better than C4.5, 50% better than Naive Bayes, and 43% better than the k-nearest neighbor classifier. Experimental results on another independent dataset are also presented to show the strength of PCL. This chapter is adapted from Ref. [248], c©2003, with permission from Oxford University Press.