Perceptual Underpinnings of Automatic Pronunciation Assessment

doi:10.4324/9780203937761-11

ABSTRACT

One of the exciting promises of speech recognition technology in language learning is the ability to give automatic feedback on specific pronunciation problems and on the overall impression of nonnativeness produced by the learner’s speech. There has been a substantial amount of work on the automatic detection of pronunciation problems (e.g., Brett, 2004; Eskenazi, 1996; Eskenazi, Ke, Albornoz, & Probst, 2000; Franco, Neumeyer, Ramos, & Bratt, 1999; Herron et al., 1999; Menzel et al., 2001; Neri, de Wet, Cucchiarini, & Strik, 2004; Tokuyama & Miwa, 2004; Witt & Young, 2000) and on automatically producing an overall assessment of nonnativeness (e.g., Cucchiarini, Strik, Binnenpoorte, & Boves, 2000; Cucchiarini, Strik, & Boves, 1998; Franco, Neumeyer, & Kim, 1997; Rypa & Price, 1999). There have also been preliminary efforts to incorporate this work into language tutors. For example, the Voice Interactive Training System (VILTS) developed by Rypa and Price (1999) provides overall assessments of nativeness. Using a communicative approach to elicit students’ utterances, that system logs and stores student speech as a basis for delivering an overall pronunciation score. More recently, EduSpeak® (Franco et al., 2000), a software development toolkit, makes available to software developers speech recognition technology that can assess overall nativeness and estimate the nativeness of individual phones. EduSpeak® is described on the web (www.speechatsri.com/products/eduspeak.shtml) and has been applied in computer-based pronunciation tutors, for example, by LaRocca, Morgan, and Bellinger (2002) and by Vazulik, Aguirre, and Newman (2002).