ABSTRACT

Since in the US a low second-reader bar for CAD to be considered a success has been adopted, standalone CAD performance is rarely measured. The author analyzed a dataset from one of the few published studies in mammography where standalone performance of CAD was compared to a group of expert radiologists interpreting the same cases. The published analysis is extended to account for case-variability. By considering the difference in performance between each radiologist and CAD, the problem reduces to the single-modality multiple-reader analysis described in Chapter 10, except that the NH is that the average difference performance is zero. The method is applicable to any scalar figure of merit that can be calculated from the data. The use of partial-area measurements in CAD research is strongly discouraged. It is shown that these yield ambiguous results and moreover ignore some of the data. R code is presented for fixed–case and random-case analysis, the former duplicating the analysis in the Hupse-Karssemeijer 2013 publication. Allowing case to be random increases, as expected, the widths of confidence intervals and p-values. The AUC under the LROC yields a p-value of 0.0349, significant at alpha = 0.05 but the corresponding AUC under the ROC did not yield a significant difference (p = 0.321). The difference is attributed to the increased statistical power of LROC relative to ROC, which is afforded by using location information.