chapter  17
Rare Class Learning
ByCharu C. Aggarwal
Pages 23

The problem of rare class detection is closely related to outlier analysis [2]. In unsupervised outlier analysis, no supervision is used for the anomaly detection process. In such scenarios, many of the anomalies found correspond to noise, and may not be of any interest to an analyst. It has been observed [35,42,61] in diverse applications such as system anomaly detection, financial fraud, and Web robot detection that the nature of the anomalies is often highly specific to particular kinds of abnormal activity in the underlying application. In such cases, unsupervised outlier detection methods may often discover noise, which may not be specific to that activity, and therefore may also not be of any interest to an analyst. The goal of supervised outlier detection and rare class detection is to incorporate application-specific knowledge into the outlier analysis process, so as to obtain more meaningful anomalies with the use of learning methods. Therefore, the rare class detection problem may be considered the closest connection between the problems of classification and outlier detection. In fact, while classification may be considered the supervised analogue of the clustering problem, the rare class version of the classification problem may be considered the supervised analogue of the outlier detection problem. This is not surprising, since the outliers may be considered rare “unsupervised groups” for a clustering application.