ABSTRACT

Different data mining tasks, such as regression, classification, association rule mining, and clustering, are widely discussed in the literature. The task of interest in this chapter is classification. We will focus on the state-of-the-art in rule-based learning. From a user perspective, a classification model should be comprehensible and justifiable (that is, intuitively correct and in accordance with domain knowledge), and provide correct predictions (Martens et al., 2011). The last requirement is that the model generalizes well, in the sense that it provides correct predictions on new, unseen data instances. This generalization behavior is typically measured by the percentage of correctly classified test instances (PCC). Other commonly used measures include sensitivity and specificity, the receiver operating curve (ROC), and the area under the ROC curve (AUC).