ABSTRACT

Score reliability characterizes the degree to which scores evaluate “something” as opposed to “nothing” (e.g., are completely random). Thompson (2003b) explained the concept of score reliability using the metaphor of a bathroom scale, noting that:

[M]any of us begin our day by stepping on a scale to measure our weight. Some days when you step on your bathroom scale you may not be happy with the resulting score. On some of these occasions, you may decide to step off the scale and immediately step back on to obtain another estimate. If the second score is half a pound lighter, you may irrationally feel somewhat happier, or if the second score is slightly higher than the fi rst, you may feel somewhat less happy. But if your second weight measurement yields a score 25 pounds lighter than the initial measurement, rather than feeling happy, you may instead feel puzzled or perplexed. If you then measure your weight a third time and the resulting score is 40 pounds heavier, you probably will question the integrity of all the scores produced by your scale. It has begun to appear that your scale is exclusively producing randomly fl uctuating scores. In essence, your scale measures “nothing.” (p. 4)

Sometimes we desire protocols that yield scores that are completely random (i.e., measure nothing, and are perfectly unreliable). For example, when we enter a casino our reasonable premise is that the dice and the roulette wheels yield scores that are perfectly unreliable.