ABSTRACT

The aim for computer speech synthesis from either textual or conceptual input is to imitate the characteristics of the typical human speaking process well enough to produce synthetic speech that is acceptable to human listeners. Text consists of alphanumeric characters, blank spaces and possibly a variety of special characters. The pre-processing stage will normally also detect and record instances of punctuation and other relevant formatting information such as paragraph breaks. In a language such as English the relationship between the spellings of words and their phonemic transcriptions is extremely complicated. Word pronunciation is normally obtained using some combination of a pronunciation dictionary and letter-to-sound rules. Information derived in the text analysis can be used to generate prosody for the utterance, including the timing pattern, overall intensity level and fundamental-frequency (pitch) contour. Morphemes are the minimum meaningful units of language. Some syntactic analysis is needed both to resolve pronunciation ambiguities and to determine how the utterance should be structured into phrases.