Automating Counterspeech

doi:10.4324/9781003377078-11

ABSTRACT

The proliferation of online hate speech is an increasing problem in our digital societies, and counterspeech provides a potentially effective method for responding to statements and narratives that are offensive or toxic. While the other chapters of this book have examined counterspeech in human-to-human interactions, this chapter will consider the important task of automating counterspeech in dialogue systems. Since it is often difficult for human beings to produce effective counterspeech consistently, it is even more challenging to design an automated system that can respond well to offensive or toxic inputs. Nonetheless, given the increasing prevalence of online hate speech, automated methods for responding to it are ever more desirable. Such methods could be used to suggest possible responses that the human recipients of the hate speech could choose to use when replying to it. In addition, virtual personal assistants such as Siri, Cortana, and Alexa are acquiring increasingly prominent roles in the daily lives of many people, and, inevitably, such systems are routinely subjected to hateful or offensive inputs. Automating counterspeech effectively will enable such systems to respond more appropriately whenever they are subjected to verbal abuse.

Prompted by these concerns, this chapter will summarize the current state-of-the-art research into this important task. More specifically, it will motivate the need for such systems, discuss the available training data, consider the different kinds of evaluation frameworks that are needed to assess the quality of the responses produced (e.g., automated metrics and human-based approaches), and discuss some of the models that can be used for this task.