Supervised classification problems appear in many guises: in medical diagnosis, in epidemiological screening, in creditworthiness classification, in speech recognition, in fault and fraud detection, in personnel classification, and in a host of other applications. Such problems have the same abstract structure: given a set of objects, each of which has a known class membership and for each of which a descriptive set of measurements is given, construct a rule that will allow one to assign a new object to a class solely on the basis of its descriptive measurement vector. Because such problems are so widespread, they have been investigated by several different (though overlapping) intellectual communities, including statistics, pattern recognition, machine learning, and data mining, and a large number of different techniques have been developed (see, for example, Hand 1997 [154], Hastie et al. 2001 [162], and Webb 2002 [314]). The existence of this large number of distinct approaches prompts the question of how to choose between them. That is, given a particular classification problem, which of the many possible tools should be adopted?