ABSTRACT

In this chapter, the more commonly used ratings method of acquiring ROC data is described. It yields greater definition to the underlying ROC curve than just one operating point obtained in the binary task, and is more efficient. In this method, the observer assigns a rating to each case, where the rating is an ordered label summarizing the confidence of the observer that the case is diseased. Described is a typical ROC counts table, actually an observed clinical dataset, and how operating points (i.e., pairs of FPF and TPF values) are calculated from it. A labeling convention for the operating points is introduced, as is notation for the observed integers in the counts table. The rules for calculating operating points are expressed as formulae and implemented in R. The ratings method is contrasted to the binary method, in terms of efficiency and practicality. A theme occurring repeatedly in this book, that the ratings are not numerical values but rather they are ordered labels is illustrated with an example. A method of collecting ROC data on a 6-point scale is described that has the advantage of yielding an unambiguous single operating point. The forced choice paradigm is described. Two current controversies are addressed: one on the utility of discrete (e.g., 1–6) vs. quasi-continuous (e.g., 0–100) ratings and the other on the applicability of a clinical screening mammography-reporting scale for ROC analyses. The author recommends the 1–6 discrete scale and favors using cancer yield to reorder clinical ratings such as BI-RADS.