ABSTRACT

FIGURE 4.15: The PR (left) and cumulative recall (right) curves of AdaBoost.M1 with self-training together with ORh and standard AdaBoost.M1 methods.

The main goal of this chapter was to introduce the reader to a new class of data mining problems: outliers ranking. In particular, we have used a dataset that enabled us to tackle this task from different perspectives. Namely, we used supervised, unsupervised-and semi-supervised approaches to the problem. The application used in this chapter can be regarded as an instantiation of the general problem of finding unusual observations of a phenomenon having a limited amount of resources. Several real-world applications map into this general framework, such as detecting frauds in credit card transactions, telecommunications, tax declarations, etc. In the area of security, there are also several applications of this general concept of outlier ranking.