Oct 2, 2023 - Technology

Meta says its AI trains on your Instagram posts

Illustration: Annelise Capossela/Axios

Meta admitted late last week that it has used mountains of public Facebook posts to train its AI models, per Reuters.

Why it matters: As the AI boom continues, content creators are challenging tech companies' use of their material in the development of advanced AI tools — and in Facebook's case, "content creators" means a few billion people.

Details: After Meta unveiled its new AI assistants last week, its president of global affairs, Nick Clegg, told Reuters that the "vast majority" of the training data used to develop them came from publicly available posts, including on Facebook and Instagram.

  • "We've tried to exclude datasets that have a heavy preponderance of personal information," Clegg told Reuters — such as data from LinkedIn.

The big picture: A massive legal battle is brewing between owners of copyrighted content, like books and professional media products, and AI companies that may have intentionally or inadvertently used their works to train their programs.

  • Meta has always claimed a variety of rights in the content its users post, so legally it's in a different situation than companies that are using copyrighted texts.
  • The company tells users "you own all of the content and information" you post. But if you make a post public, as many do by default, it becomes available for all sorts of purposes that you can't control.
  • Clegg told Reuters that Meta, like many other tech firms, believes its use of posts to train AI is covered by the legal doctrine of fair use — but added, "I strongly suspect that's going to play out in litigation."

Of note: Medium, the decade-old platform for long-form articles, recently told its users that it would block OpenAI's web crawler and resist other efforts by AI companies to harvest its content to use for training.

Go deeper: How we all became AI's brain donors

Go deeper