ABSTRACT

A rating scale is typically a series of hierarchical levels, with each level providing a proficiency descriptor against which learner performance is measured. Each level (or band) in the rating scale is characterized by a verbal descriptor which, taken together, constitute the operational definition of the construct that the test developer claims to be assessing (Fulcher, 1996: 227; Davies et al., 1999). Recent variations in performance data-based scales have included empirically derived, binary-choice, boundary definition scales (EBBs) (Upshur and Turner, 1995) and performance decision trees (Fulcher, 2010; Fulcher et al., 2011). Rating scales can be oriented towards the examiner, the test taker or the test constructor (Alder-

son, 1991). They can be holistic, analytic, primary-trait or multiple-trait (Hamp-Lyons, 1991); and they can be ‘real-world’ or ‘interaction/ability’ focused (Bachman, 1990). Most rating scales are designed for the use of raters (judges) who match writing or speaking performances to descriptors in order to arrive at a score. These are ‘examiner-oriented’ scales, and are the focus of this chapter. Holistic scales require the rater to make a global, holistic judgment about a performance, so that there is no counting or ‘tallying’ of particular features or errors. These scales seek to get at the general ‘quality’ of the performance, in much the same way that judges rate skating or diving performances in the Olympics (Pollitt, 1990). Conversely, an analytic scale requires the enumeration of specific features in a performance, such as the number of errors or of appropriate second-parts of adjacency pairs, and so on. A primary-trait scale asks the rater to make a single judgment about the performance on a single construct, such as ‘communicative ability’. Each descriptor in the rating scale must therefore describe a level within this construct. Multiple-trait rating scales assume that multiple constructs underlie a performance, and therefore require separate scores for each trait or construct. These are most useful when assembling profiles in classroom assessment. Any of these scale types may have a real-world focus, where the descriptors attempt to describe what the learner ‘can do’ in the real world, or have an ability focus, where the descriptors describe the underlying abilities that a learner needs to have acquired in order to do the test task successfully, and which are also needed for successful task completion in a defined real-world context. These dimensions may be used to describe most rating scales currently in use (Fulcher, 2003: 91).