ABSTRACT

The focus in language learning on the development of communicative competence, and the centrality of speaking and writing in second language use contexts, means that it is common for language tests to involve the measurement of oral and written language proficiency. Such assessments typically require the involvement of trained personnel to assess the performance and, in the case of speaking assessments, to elicit the performance. The recruitment of sufficient numbers of suitably qualified people can be a challenge in itself; to ensure that they perform their required task to the same standard is another. To the test takers, it should not matter who scores their performance, or who interviews

them. One examiner should be interchangeable with another. Inconsistency or lack of equivalence in scoring or eliciting test performances by its very nature has an impact on validity in terms of the inferences one is able to draw about learners on the basis of their scores. Nevertheless, that unwanted variation in ratings exists due to variability among and within raters and interlocutors is well known. There is a considerable body of research which indicates that raters differ in the way they go about the task of rating, that they follow different procedures to arrive at a score, and that they can – consciously or unconsciously – be influenced by their background when judging test taker proficiency. It has also been shown that the behaviour of interviewers and other speaking test interlocutors can be influenced by their background or orientation to their role. McNamara (1996) argues that the rating may tell as much about the rater or the interlocutor as it does about the test taker. Given this, the need for the careful selection and training of examiners is well under-

stood. Rater training is an act of socialization into the standards set by the test owner (Lumley, 2002), and its purpose is to ensure that ratings derive from a consideration of the features defined within the construct and to ensure a high degree of comparability, both inter-and intra-rater. As far as the content and methodology of rater training is concerned, there is very little public description of the procedures adopted by specific examination boards or test developers, although a review of the research literature suggests that most seem to follow a similar approach, loosely corresponding to the six steps proposed by Bachman and Palmer (1996: 222):

a. Read and discuss the scales together. b. Review language samples which have been previously rated by expert raters and discuss the

ratings given. c. Practice rating a different set of language samples. Then compare the ratings with those of

experienced raters. Discuss the ratings and how the criteria were applied. d. Rate additional language samples and discuss. e. Each trainee rates the same set of samples. Check for the amount of time taken to rate and for

consistency. f. Select raters who are able to provide reliable and efficient ratings.