ABSTRACT

Speech is crucial for effective communication; it not only conveys the content but also the context and intent behind the speech. Emotional speech has a variety of applications in Human Machine Interaction (HMI). The paper proposes a word-level algorithm to transform a neutral Kannada speech sentence into a target emotional speech sentence. For segmentation of speech dynamic thresholding technique with Short-Time Zero Crossing Rate (STZCR), Short-Time Energy (STE), intensity, and Spectral Centroid (SC) are used. The Gaussian Regression Model (GRM) and Gaussian Normalization Model (GNM) are used in predicting the pitch of target emotional speech, and the Discrete Wavelet Transform (DWT) is used to modify the spectral energy of neutral speech. Objective and subjective tests are carried out to evaluate the expressiveness of the proposed method. The findings indicate a noteworthy rise in the Emotion Recognition Rate (ERR) and Mean Opinion Score (MOS). Using the GRM approach, the transformation of happiness and sadness yields an ERR of 84.9% and 89.8%, respectively. Using the GNM approach, the transformation of anger and fear yields an ERR of 82.87% and 84.9%, respectively. The results obtained were compared with some of the existing methods, and it was observed that the results are satisfactory.