ABSTRACT

With the world today relying more and more on digital media for daily communication, it is imperative to devise various methodologies for the analysis and the parsing of content uploaded to various media, to ensure that unsafe and inappropriate content can be removed quickly and with robustness. The need to filter out offensive language and safeguard internet users from being the targets of online abuse and cyberbullying is urgent. This chapter focuses on the toxicity analysis and categorizing text corpora according to the subtypes of toxicity. Along with one recurrent neural network model, namely, bidirectional LSTM, four models based on transformer architecture are adopted, viz. BERT, DistilBERT, RoBERTa, and ALBERT. Using the most prominent and highly influential features out of all, a comparative analysis of the models is presented.