Unicorns and Horses: Classification | 6 | Data Science and Analytics w

ABSTRACT

Classification is a task that involves arranging objects systematically into appropriate groups or categories depending on the characteristics that define such groupings. It is important to emphasise that the groups are pre-defined according to established criteria. This chapter discusses how classification algorithms are used and scored. A very convenient way to evaluate the accuracy of a classifier is the use of a table that summarises the performance of our algorithm against the data provided. The accuracy of a classifier is the ratio of correctly classified instances and the total number of data points. The Receiver Operator Characteristicor (ROC) is a quantitative analysis technique used in binary classification. In a ROC curve the True Positive Rate is plotted as a function of the False Positive Rate for different cut-off points or thresholds for the classifier. It is possible to obtain various ROC plots with the aid of cross-validation.