Machine learning can't flag false news, new studies show
Current machine learning models aren't yet up to the task of distinguishing false news reports, two new papers by MIT researchers show.
The big picture: After different researchers showed that computers can convincingly generate made-up news stories without much human oversight, some experts hoped that the same machine-learning-based systems could be trained to detect such stories. But MIT doctoral student Tal Schuster's studies show that, while machines are great at detecting machine-generated text, they can't identify whether stories are true or false.
Details: Many automated fact-checking systems are trained using a database of true statements called Fact Extraction and Verification (FEVER).
- In one study, Schuster and team showed that machine learning-taught fact-checking systems struggled to handle negative statements ("Greg never said his car wasn't blue") even when they would know the positive statement was true ("Greg says his car is blue").
- The problem, say the researchers, is that the database is filled with human bias. The people who created FEVER tended to write their false entries as negative statements and their true statements as positive statements — so the computers learned to rate sentences with negative statements as false.
- That means the systems were solving a much easier problem than detecting fake news. "If you create for yourself an easy target, you can win at that target," said MIT professor Regina Barzilay. "But it still doesn't bring you any closer to separating fake news from real news."
- Both studies were headed by Schuster with teams of MIT collaborators.
The bottom line: The second study showed that machine-learning systems do a good job detecting stories that were machine-written, but not at separating the true ones from the false ones.
Yes, but: While you can generate bogus news stories more efficiently using automated text, not all stories created by automated processes are untrue.
- Text bots can be designed to adapt true stories for different audiences, or convert statistics into true news articles.