ABSTRACT

Are We Building Them? . . . . . . . . . . . . . . . . . . . . . . . . . . 86

What Types of Multimodal Interfaces Exist, and

What Is Their History and Current Status? . . . . . . . . . . 86

What Are the Goals and Advantages

of Multimodal Interface Design? . . . . . . . . . . . . . . . . . . . 89

What Methods and Information Have Been

Used to Design Novel Multimodal Interfaces? . . . . . . . 92

What Are the Cognitive Science Underpinnings

of Multimodal Interface Design? . . . . . . . . . . . . . . . . . . . 93

When Do Users Interact Multimodally? . . . . . . . . . . . . . . . 93

What Are the Integration and Synchronization

Characteristics of Users’ Multimodal Input? . . . . . . . . . . . 94

What Individual Differences Exist in Multimodal Interaction,

and What Are the Implications for

Designing Systems for Universal Access? . . . . . . . . . . . . . . 95

Is Complementarity or Redundancy

the Main Organizational Theme That

Guides Multimodal Integration? . . . . . . . . . . . . . . . . . . . . . 96

What Are the Primary Features of Multimodal

Language? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

What Are the Basic Ways in Which

Multimodal Interfaces Differ From

Graphical User Interfaces? . . . . . . . . . . . . . . . . . . . . . . . . 97

What Basic Architectures and Processing

Techniques Have Been Used to Design

Multimodal Systems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

What Are the Main Future Directions

for Multimodal Interface Design? . . . . . . . . . . . . . . . . . . 99

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Multimodal systems process two or more combined user input modes-such as speech, pen, touch, manual gestures, gaze, and head and body movements-in a coordinated manner with multimedia system output. This class of systems represents a new direction for computing, and a paradigm shift away from conventional WIMP interfaces. Since the appearance of Bolt’s (1980) “Put That There” demonstration system, which processed speech in parallel with touch-pad pointing, a variety of new multimodal systems has emerged. This new class of interfaces aims to recognize naturally occurring forms of human language and behavior, which incorporate at least one recognition-based technology (e.g., speech, pen, vision). The development of novel multimodal systems has been enabled by the myriad input and output technologies currently becoming available, including new devices and improvements in recognition-based technologies. This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multimodal interaction and interface design.