Jan 12, 2024 - Science

Machine forgetting: How difficult it is to get AI to forget

Illustration of a trash bin full of zeros and ones, with a few on the ground around the bin.

Illustration: Aïda Amer/Axios

Users want answers from artificial intelligence, but as the technology moves into daily life and raises legal and ethical concerns, sometimes they want AI to forget things, too. Researchers are working on ways to make that possible — and finding machine unlearning is a puzzling problem.

Why it matters: Copyright laws and privacy regulations that give people the "right to be forgotten," along with concerns about AI that is biased or generates toxic outputs, are driving interest in techniques that can remove traces of data from algorithms without interfering with the model's performance.

Deleting information from computer storage is a straightforward process, but today's AI doesn't copy information into memory — it trains neural networks to recognize and then reproduce relationships among bits of data.

  • "Unlearning isn't as straightforward as learning," Microsoft researchers recently wrote. It's like "trying to remove specific ingredients from a baked cake — it seems nearly impossible."

How it works: Machine learning algorithms are trained on a variety of data from different sources in a time-consuming and expensive process.

  • One obvious way to remove the influence of a specific piece of data — because it is incorrect, biased, protected, dangerous or sensitive in some other way — is to take it out of the training data and then retrain the model.
  • But the high cost of computation means that is basically a "non-starter," says Seth Neel, a computer scientist and professor at Harvard Business School.

Driving the news: A machine unlearning competition that wrapped up in December asked participants to remove some facial images used to train an AI model that can predict someone's age from an image.

  • About 1,200 teams entered the challenge, devising and submitting new unlearning algorithms, says co-organizer Peter Triantafillou, a professor of data science at the University of Warwick. The work will be described in a future paper.

What's happening: Researchers are trying a variety of approaches to machine unlearning.

  • One involves splitting up the original training dataset for an AI model and using each subset of data to train many smaller models that are then aggregated to form a final model. If some data then needs to be removed, only one of the smaller models has to be retrained. That can work for simpler models but may hurt the performance of larger ones.
  • Another technique involves tweaking the neural network to de-emphasize the data that's supposed to be "forgotten" and amplify the rest of the data that remains.
  • Other researchers are trying to determine where specific information is stored in a model and then edit the model to remove it.

Yes, but: "Here's the problem: Facts don't exist in a localized or atomized manner inside of a model," says Zachary Lipton, a machine learning researcher and professor at Carnegie Mellon University. "It isn't a repository where all the facts are cataloged."

  • And a part of a model involved in knowing about one thing is also involved in knowing about other things.

There is a "tug of war between the ability of the network to keep working correctly on the data that it has been trained on and has remained, and basically forgetting the data that people want to forget," Triantafillou says.

  • He and his colleagues presented a paper at a top AI conference in December that detailed an algorithm for several unlearning needs, including removing bias, correcting data that is mislabeled — purposely or accidentally — and addressing privacy concerns.

Zoom in: There's particular interest in unlearning for generative language models like those that power ChatGPT and other AI tools.

  • Microsoft researchers recently reported being able to make Llama 2, a model trained by Meta, forget what it knows about the world of Harry Potter.
  • But other researchers audited the unlearned model and found that, by rewording the questions they posed, they could get it to show it still "knew" some things about Harry Potter.

Where it stands: The field is "a little messy right now because people don't have good answers to some questions," including how to measure whether something has been removed, says Gautam Kamath, a computer scientist and professor at the University of Waterloo.

  • It's a pressing question if companies are going to be held liable for people's requests that their information be deleted or if policymakers are going to mandate unlearning.

Details: From an abstract perspective, what is being unlearned and what it means to unlearn aren't clearly defined, Lipton says.

  • But Neel says there is a working definition of "reverting the model to a state that is close to a model that would have resulted without these points." He adds there are reasonable metrics to evaluate these methods, pointing to work by Triantafillou as well as his own research that evaluates whether an adversary attack can tell if certain data points were used to train a network.
  • Still, he says, there is "a lot more work to be done": "For simple models, we know how to do unlearning and have rigorous guarantees," but for more complex models, there isn't "consensus on a single best method and there may never be."

What to watch: There are a range of potential applications for unlearning.

  • Some settings may not be "super, high-stakes end-of-the-world," Neel says. For example, with copyright concerns, it might be that it is sufficient to stop a model from reproducing something verbatim but still be able to tell what points were used in training.
  • It may be a case-by-case scenario involving negotiations between "the party requesting deleting and model owner," he says.
  • But others could require complete unlearning of information that poses potentially catastrophic security consequences or serious privacy concerns. Here, Lipton says, there aren't actionable methods and near-term policy mandates should "proceed under the working assumption that (as of yet) mature unlearning technology does not exist."
Go deeper