Measuring Classifier Performance: On the Incoherence of the Area under the ROC Curve and What to Do about It

doi:10.1201/b11429-14

Chapter

Measuring Classifier Performance: On the Incoherence of the Area under the ROC Curve and What to Do about It

ABSTRACT

Supervised classiﬁcation problems appear in many guises: in medical diagnosis, in epidemiological screening, in creditworthiness classiﬁcation, in speech recognition, in fault and fraud detection, in personnel classiﬁcation, and in a host of other applications. Such problems have the same abstract structure: given a set of objects, each of which has a known class membership and for each of which a descriptive set of measurements is given, construct a rule that will allow one to assign a new object to a class solely on the basis of its descriptive measurement vector. Because such problems are so widespread, they have been investigated by several diﬀerent (though overlapping) intellectual communities, including statistics, pattern recognition, machine learning, and data mining, and a large number of diﬀerent techniques have been developed (see, for example, Hand 1997 [154], Hastie et al. 2001 [162], and Webb 2002 [314]). The existence of this large number of distinct approaches prompts the question of how to choose between them. That is, given a particular classiﬁcation problem, which of the many possible tools should be adopted?