Mar 14, 2024 - Technology

Generative AI's privacy problem

Illustration of a person standing behind a giant cursor which is acting like an x-ray, revealing the person's skeleton.

Illustration: Aïda Amer/Axios

Privacy is the next battleground for the AI debate, even as conflicts over copyright, accuracy and bias continue.

Why it matters: Critics say large language models are collecting and often disclosing personal information gathered from around the web, often without the permission of those involved.

The big picture: Many businesses have grown wary of execs and employees using proprietary information to query ChatGPT and other AI bots — either banning such apps or opting for paid versions that keep business information private.

  • As more individuals use AI to seek relationship advice, medical information or psychological counseling, experts say the risks to individuals are growing.
  • Personal data leaks from AI can take a variety of forms, from accidental information disclosure to data gained via deliberate efforts to break through guardrails.

Driving the news: Several lawsuits seeking class action status have been filed in recent months alleging Google, OpenAI and others have violated federal and state privacy laws in training and operating their AI services.

  • The FTC issued a warning in January that tech companies have an obligation to uphold their privacy commitments as they develop generative AI models.

"With AI, it's this big feeding frenzy for data, and these companies are just gathering up any personal data they can find on the internet," George Washington University law professor Daniel J. Solove told Axios.

  • The risks go far beyond just the disclosure of discrete pieces of private information, argues Timothy K. Giordano, partner at Clarkson Law Firm, which has brought a number of privacy and copyright suits against generative AI companies.

Between the lines: While AI is creating new scenarios, Solove points out that many of these privacy issues aren't new.

  • "A lot of the AI problems are exacerbations of existing problems that law has not dealt with well," Solove told Axios, pointing to the lack of federal online privacy protections and the flaws in the state laws that do exist.
  • "If I had to grade them, they would be like D's and F's," Solove said. "They are very weak."

The big picture: Generative AI's unique capabilities raise bigger concerns than the common aggregation of personal information sold and distributed by data brokers.

  • In addition to potentially sharing specific pieces of data, generative AI tools can draw connections, or inferences (accurate or not), Giordano told Axios.
  • This means tech companies now have, in Giordano's words, "a chillingly detailed understanding of our personhood — enough ultimately to create digital clones and deepfakes that would not only look like us, but that could also act and communicate like us."

Building AI so that it respects data privacy is complicated by how generative AI systems work.

  • Typically, they are trained on huge sets of data that leave a kind of probability imprint in the model, but they don't save or store the data afterward. That means you can't simply erase information that's been woven in.
  • "You cannot untrain generative AI," said Grant Fergusson, a fellow at the Electronic Privacy Information Center. "Once the system has been trained on something, there's no way to take that back."

Reality check: Many online publishers and AI companies have added language noting that customer data may be used to train future models.

  • In some cases, people have the option to choose not to have their data used for AI training, though such policies vary and data sharing settings can be confusing and hard to find.
  • Plus, even where users do offer consent, they might be sharing data that could impact the online privacy of others.

The other side: An OpenAI representative told Axios it doesn't seek out personal data to train its models and takes steps to prevent its models from disclosing private or sensitive information.

  • "We want our models to learn about the world, not private individuals," an OpenAI spokesperson told Axios. "We also train our models to refuse to provide private or sensitive info about people."
  • The company said its privacy policy outlines options for people to delete certain information as well as to opt out of model training.

What's next: Regulators will try to enforce existing privacy laws in the new AI realm, lawmakers will propose new bills, and courts will grapple with novel dilemmas.

  • AI companies could do more on their own, but Solove said that expecting companies to protect privacy without mandating it through law is probably unrealistic.
  • "You're kind of telling sharks, 'Please sit down and use utensils,'" Solove said.
Go deeper