ABSTRACT

Hate speech is offensive or threatening speech or writing that manifests bias against a specific group, basically contingent on religion or sexual orientation. Large-scale social media platforms invest significant resources to detect and classify hate messages with so little success automatically. Hate speeches spread via online social media platforms can trigger damage and distress to human beings and cause social disturbance further than cyberspace. Consequently, in recent times, in user-generated social media content, the detection of offensive language has become an escalating issue. Social media platforms produce a massive amount of data, which requires an effective classification model to uncover hate speech. On the internet and social media platforms, it is easy to spread hatred anonymously. This research will study several feature extraction techniques and classification algorithm’s performance to identify hate speech and repulsive terminology efficiently. This research will inspect several feature extraction techniques such as TF-IDF, Doc2Vec and calculate polarity scores in sentiment analysis by applying different classifiers such as support vector machine, random forest, naïve Bayes and logistic regression. Our purpose is to assess classification algorithms and feature extraction techniques to determine whether their contribution to classification models gives better results.