Alpha Compared With Primarily Nominal Agreement Measures | 5

ABSTRACT

While the previous chapter explored some properties of the α-agreement measure, this chapter adds comparisons with other coefficients. It sketches α’s lineages but focuses mainly on its relationships with other measures, most of which do not go much beyond the simplest αs introduced in Chapter 2. Single-valued nominal agreement coefficients for predefined units are by far the most widely used and hotly debated coefficients, starting with percent agreement.

Percent agreement is widely understood and therefore frequently used. However, not only is it limited to just two observers, its scale between 0 and 1 or 100% has no clear reliability interpretations which is why so-called chance-corrected agreement coefficients were proposed.

However, in the literature, chance has no universal definition. Relating percent agreement to diverse conceptions of chance yields diverse measures that share merely their name (and their applicability to two observers categorizing a common set of units). To explore what they could say about the reliability of data this chapter compares five prototypes: Bennett et al.’s (1954) S whose conception of chance is related to the number of categories available; Cohen’s (1960) κ whose conception of chance is the statistical independence of two observers; Gwet’s (2008) AC ₁ which aimed at avoiding Feinstein and Cicchetti’s (1990) paradoxes; and Scott’s (1955) π, which is closely related to the fifth, Krippendorff’s α in this simple form.

The chapter ends with a table summarizing what each of these plus a few other coefficients measure or fail to respond to and the conditions under which they happen to apply to the reliability of data.