ABSTRACT

Text-to-speech synthesis and voice generation for robotics are described with reference to the phases of text normalization, prosodic analysis, and concatenation of speech segments; the target and concatenation cost functions. Speech recognition and understanding involve several operations such as human speech to readable written text conversion, application of natural language processing algorithms, context awareness, semantic parsing, and dialogue management, all of which play vital roles in the process. Speech synthesis and recognition are indispensable components of a technology that allows robots to generate human-like speech, interpret words spoken by humans, and comprehend the meaning behind those words. The salient aspects of this technology are elucidated to clarify how robots communicate effectively with humans through spoken language.