Arabic speech and language technology

doi:10.4324/9781315147062-16

ABSTRACT

Models of Computational Syntax accept plain text as input, and generate text tagged with grammatical function and/or phrase structure. Models of Computational Semantics generate text tagged with categorical indicators of meaning and/or logical structure, while models of Computational Pragmatics label the discourse and/or sociolinguistic functions of the words and phrases in a sentence. The chapter shows that the input-output mappings are the subject of Arabic Speech and Language Technology (SLT). Every speaker of Arabic can construct valid sentences in at least two different linguistic varieties. In school, Arabic speakers learn Modern Standard Arabic (MSA). At home, they speak a colloquial dialect. In computational phonetics, the feature functions are different descriptions of the sound of the utterance: the pharyngealized sounds of Arabic. An Arabic script transcription of an Arabic audio file is a complete specification of the phonemes contained in the file, because the Arabic script uniquely specifies the 28 consonants and the three long vowels of MSA.