ABSTRACT

Speech-based interactions allow users to communicate with computers or computer-related devices without the use of a keyboard, mouse, buttons, or any other physical interaction device. By leveraging a skill that is mastered early in life, speech-based interactions have the potential to be more natural than interactions using other technologies such as the keyboard. Based on the input and output channels being employed, speech interactions can be categorized into three groups: spoken dialogue systems, speech output systems, and speech recognition systems. Spoken dialogue systems include applications that utilize speech for both input and output, such as telephony systems and speechbased environment control systems with voice feedback. Speech output systems include applications that only utilize speech for output while leveraging other technologies, such as the keyboard and mouse, for input. Screen access so ware, which is o en used by individuals with visual impairments, is an example of speech output. Speech recognition systems include applications that utilize speech for input and other modalities for output, such as speech-based cursor control in a GUI (graphical user interface) and speech-based dictation systems (see Table 30.1). e focus of this chapter is on those interactions where speech is used to provide input to some kind of computing technology. When discussing speech-based input, potential applications can

be divided into three major categories, which are most easily distinguished based on the size of the vocabulary that the system recognizes, but the reality is that there is no clear dividing line that separates these categories. Vocabulary size is a continuous variable with systems recognizing as few as two or as many as tens of thousands of words. In the subsequent discussion, both speech and nonspeech output are considered. Typical applications include:

Telephony systems, which tend to use small input vocabu-• laries as well as speech output, environmental control applications with small input vocabularies that may support speech or nonspeech output Speech-based interactions with GUIs can support naviga-• tion, window manipulations, and various other commandbased interactions with widely varying input vocabularies ranging from just a few words to several hundred Dictation applications, which support users as they com-• pose e-mails, letters, and reports as well as smaller tasks such as fi lling in portions of forms where free-form input is allowed

From the perspective of universal access (UA), speech-based interactions should be considered one of a set of tools available to help address the goal of ensuring that information technologies are accessible by all citizens as they address a variety of tasks

30.1 Introduction ...........................................................................................................................30-1 30.2 Speech-Based Applications ..................................................................................................30-2

Small Vocabulary Solutions • Large Vocabulary: Dictation Systems 30.3 Research Issues in Speech Interaction ............................................................................... 30-4

30.5 e Future of Speech Interaction ......................................................................................30-13 Acknowledgments ...........................................................................................................................30-13 References .........................................................................................................................................30-13

in diverse contexts. While UA is concerned with addressing the needs of all possible users, three populations are of particular interest: children, older adults, and individuals with disabilities. For older users, who may be experiencing age-related visual and physical impairments, the graphical interface with a keyboard and mouse for input can present a variety of challenges, while speech can off er a natural style of interaction and reduce the need for physical interactions. Educational so ware, toys, and various web sites use speech to provide information to children, which serves as a natural and potentially easy-tolearn input solution. Perhaps the most obvious population that could benefi t from speech-based interactions are individuals with physical impairments that hinder their use of more traditional input devices, such as the keyboard and mouse. For these users, speech can provide eff ective, inexpensive interaction solutions.