ABSTRACT

Dialogue management technology has already for several years enjoyed a level of maturity where speech-based interactive systems are technologically possible and spoken dialogue systems have become commercially viable. Various systems that use speech interfaces exist and range from call routing (ChuCarroll and Carpenter, 1999) to information-providing systems (Aust et al., 1995; Zue, 1997; Raux et al., 2005; Sadek, 2005) and speech translation (Wahlster, 2000), not to mention various VoiceXML-type applications that enable speech interfaces on the web (VoiceXML Forum).1 e common technology is based on recognizing keywords in the user utterance and then linking these to appropriate user goals and further to system actions. is allows regulated interaction on a particular topic, using a limited set of words and utterance types (see also Chapter 30, “Speech Input to Support Universal Access,” of this handbook). While spoken dialogue systems deploy fl exibility and modularity of agent-based architectures (such as DARPA Communicator; Rudnicky et al., 1999; Seneff et al., 1999), the applications are usually designed to follow a stepwise interaction script and to direct the user to produce utterances that fi t into predefi ned utterance types, by designing clear and unambiguous system prompts. Spoken dialogue technology also uses statistical classifi ers that are trained to classify the user’s natural language utterances into a set of predefi ned classes that concern possible topics or problem classes that the user may want to get help about. In particular, they are deployed in the so-called How May I Help You (HMIHY) technology (Gorin et al., 1997), which also includes context-free grammars to guarantee a high degree of accuracy

of speech recognition and word-spotting techniques to pick the requested concepts from the user input. e HMIHY-type interfaces have been infl uential in the development of speech-based interactive systems, although typically they do not include deep language understanding components.