This chapter considers some of the issues that arise when testing and evaluating an experimental technique. It discusses Receiver operating characteristic (ROC) curves, which enable to directly compare different classification techniques by removing any specific threshold from consideration. In effect, ROC analysis considers all possible thresholds simultaneously, thus removing the threshold as a variable. This is particularly useful when comparing experimental techniques. Precision-recall (PR) curves are an alternative to ROC analysis for experimental data. There are many connections between PR and ROC curves, but in certain cases, PR curves might be preferred. ROC curves and PR curves are useful when comparing results from different experiments. In particular, one can use the area under curve (AUC), AUCp, or AUC-PR to quantify the differences in various experimental results. A significant imbalance in the sizes of the experimental datasets will affect the accuracy, and hence this must be taken into account when setting a threshold.