ABSTRACT

In this chapter, we present a mapping algorithm for automated Arabic-IPA transcription over Quranic and Modern Standard Arabic in our Boundary-Annotated Quran corpus. The initial output of this mapping algorithm is a full-form phonemic transcription of each Arabic word in the Qurʾān pronounced out-of-context as an independent unit and transcribed via the International Phonetic Alphabet (IPA). Computational rules are then developed for sub-dividing each full-form transcription into a sequence of syllable tokens where each syllable is represented as a consonant-vowel (CV) pattern from a discrete set specific to Arabic: {CV, CVV, CVC, CVVC, CVCC}. This set of CV patterns also defines syllable weights (i.e. light, heavy and super-heavy syllables) and a particular focus of this chapter is automatic assignment of primary stress over the full-form CV annotation tier, and human evaluation of automatically assigned stress via an inter-annotator agreement study. In a further development, we have generated pause-form transcriptions for verse-terminal and/or pre-boundary words throughout the corpus, where super-heavy syllables attract primary stress if they terminate a word. Our eventual aim is fully contextualized transcription, presenting phrases demarcated with boundaries as uninterrupted sequences of stressed and unstressed syllables. Our chapter also discusses codification of certain Quranic “tajwīd” recitation rules, namely: prolongation (“madd”) before pause and emphatic delivery (“qalqalah”) of the consonantal subset {ب د ج ط ق}. Both entail formulaic capture of target contexts via regular expression patterns over Arabic Unicode. We have found that the specificity of tajwīd rules governing Arabic phonemes as sacred sounds actually lends itself to algorithmic formulation. We construe automated Arabic-IPA transcription as a form of translation, with computation as a translation intermediary. Our software outputs Arabic word forms represented as stressed and syllabified IPA sequences, with further “annotation” of tajwīd prosody, for the benefit of non-native Arabic-speaking language learners and students of the Qurʾān.