ABSTRACT

The next generation of voice-based user interface technology enables easy-to-use automation of new and existing communication services, achieving a more natural human-machine interaction. By natural, we mean that the machine understands what people actually say, in contrast to what a system designer expects them to say. This approach is in contrast with menu-driven or strongly-prompted systems, where many users are unable or unwilling to navigate such highly structured interactions. AT&T’s ‘How May I Help You?’ (HMIHYsm) technology shifts the burden from human to machine, wherein the system adapts to peoples’ language, as contrasted with forcing users to learn the machine’s jargon. The goal of such systems is to extract meaning from user’s natural spoken language. It is important to quantify this notion, so that we can measure the ‘semantic information content’ of a spoken utterance and furthermore measure our success in extracting that information. Such a theory is crucial to being able to engineer systems that understand and act upon spoken language. The communication paradigm here involves inducing the machine to perform some action or undergo some internal transformation. A communication is deemed successful if the machine responds appropriately to the user’s input. This is in contrast to the traditional goal of a communication system, which was described by Shannon [11] as follows.