ABSTRACT

Any strategy or plan for sustainable development must include effective communication. It helps exchange ideas, opinions, thoughts, information, emotions, or feelings from the sender to the recipient. This exchange takes on particular importance in business, work environment, and social relationships. Safe and healthy communication is essential to build a solid partnership in a social environment. Nobody should ever do or say anything to anybody that makes him/her feel bad, undermines his/her confidence, or exerts control over him/her. The Indian National Domestic Violence Hotline defines web-based entertainment and technology abuse, often known as digital abuse, as the abuse of modern conveniences such as texting and social engagement to annoy, harass, or threaten an accomplice. This behavior frequently involves online verbal and psychological abuse. Deep learning-based intelligent systems have been incredibly effective in spotting these online abuses, whether they take the shape of text-based tweets, videos, or even memes that are just pictures or images. However, there is less attention by the research community toward automated prediction of audio abuse. This article focuses on the challenging task of audio abuse prediction. It presents a survey on conventional audio processing-based detection system and different deep learning approaches for recognizing abusive/hate speech in audio. These approaches incorporate transfer learning, emotion encoding, natural language processing, and multimodal learning.