ABSTRACT

With more than two raters, the determination of agreement becomes much more complicated. How should one measure agreement among three or more raters? One could use one overall measure like kappa in order to estimate perfect agreement, corrected for chance, or, one could estimate all pairwise agreements with kappa, then take an average of them. And as seen in the last chapter, one could estimate a common intraclass correlation between all the raters. Still another possibility with multiple raters, is to introduce the idea of partial agreement, say among six raters, and measure the agreement between fi ve, four, and three raters. What is the role of conditional type kappa measures with multiple raters? When the scores are ordinal, how should the weights be defi ned for a weighted kappa?