Oct 19, 2019 - Technology

Fighting hate with AI-powered retorts

Kaveh Waddell

Illustration of a chat conversation between a user and a bot, the user is using profanity and the bot instructs them to stop. — Illustration: Aïda Amer & Eniola Odetunde/Axios

Scientists have long tried to use AI to automatically detect hate speech, which is a huge problem for social network users. And they're getting better at it, despite the difficulty of the task.

What's new: A project from UC Santa Barbara and Intel takes a big step further — it proposes a way to automate responses to online vitriol.

The researchers cite a widely held belief that counterspeech is a better antidote to hate than censorship.
Their ultimate vision is a bot that steps in when someone has crossed the line, reining them in and potentially sparing the target.

The big picture: Automated text generation is a buzzy frontier of the science of speech and language. In recent years, huge advances have elevated these programs from error-prone autocomplete tools to super-convincing — though sometimes still transparently robotic — authors.

I wrote earlier this year about the potential for harm from convincing bot-generated text. It would be easy to train an AI writer to mimic hate speech, for example.
This project shows how the technology could instead be used for good.

How it works: To build a good hate speech detector, you need some actual hate speech. So the researchers turned to Reddit and Gab, two social networks with little to no policing and a reputation for rancor.

For maximum bile, they went straight for the "whiniest most low-key toxic subreddits," as curated by Vice. They grabbed about 5,000 conversations from those forums, plus 12,000 from Gab.
They passed the threads to workers on Amazon Mechanical Turk, a crowdsourcing platform, who were asked to identify hate speech in the conversations and write short interventions to defuse the hateful messages.
The researchers trained several kinds of AI text generators on these conversations and responses, priming them to write responses to toxic comments.

The results: Some of the computer-generated responses could easily pass as human written — like, "Use of the c-word is unacceptable in our discourse as it demeans and insults women" or "Please do not use derogatory language for intellectual disabilities."

But the replies were inconsistent, and some were incomprehensible: "If you don't agree with you, there's no need to resort to name calling."
When Mechanical Turk workers were asked to evaluate the output, they preferred human-written responses more than two-thirds of time.

Our take: This project didn't test how effective the responses were in stemming hate speech — just how successful other people thought it might be.

Even the most rational, empathetic response, not to mention the somewhat robotic computer-generated ones above, could flop or even backfire — especially if Reddit trolls knew they were being policed by bots.

"We believe that bots will need to declare their identities to humans at the beginning," says William Wang, a UCSB computer scientist and paper co-author. "However, there is more research needed how exactly the intervention will happen in human-computer interaction."

Add Axios on Google

Fighting hate with AI-powered retorts

What to read next