ABSTRACT

From an assessment perspective, the main criteria used for evaluating spoken responses in language tests with human scoring also apply when automated scores are used, i.e., score reliability, validity, and fairness. While human raters focus on holistic evaluations of a spoken response and essentially match its characteristics to an internalized template of a particular score level, automated scoring evaluates many detailed aspects of spoken proficiency first and only in second step combines the evidence it generated to produce a response score. Advances in speech recognition technology, in particular via the use of deep neural networks, have substantially reduced the word error rate of automated speech scoring systems, but word error rates remain much higher than they do for native speech. The chapter also presents some closing thoughts on the key concepts discussed in the preceding chapters of this book.