ABSTRACT

Cyberbullying continues to attract global attention owing to its socioeconomic implications, and with the advent of social media networks, there has never been a better time to mitigate its ugly trend. This is so because conversations on the network platforms are replete with abusive, toxic, inciting, provocative, and depressing messages. Natural language processing and machine learning use cases of artificial intelligence have been widely employed in proposed conceptual frameworks in the literature to investigate and mitigate the drift. However, studies hardly factor a multivariate statistical analysis into designed frameworks, which could further infer actionable insights from primary data. A process flow feature engineering approach is therefore deployed in this study coupled with an exploratory data analysis on 15,000 electioneering tweets acquired between November 2022 and January 2023 from the Nigerian Twitter cyberspace. The acquired corpus went through feature engineering techniques necessary for unstructured data including part-of-speech tagging, stemming, stopword removal, tokenization. A sentiment analysis is implemented on the tokenized corpus to get the positive, negative, neutral, and compound sentiment polarities expressed in each tweet. Together with ten other metrics, the attributes form the predictor variables that are used to train a random forest ensemble model. This is prior to an exploratory data analysis, which is to test the statistical characteristics of predictor variables. Actionable insights from the multicollinearity test, as well as the distribution of data points across the interquartile range, indicate a highly volatile vawulence bullying, which enjoys retweets and likes majorly from Twitter users with no blue check credibility.