Apr 30, 2024 - Technology

Researchers uncover servers filled with government secrets

Illustration of a US flag, but the starts are replaced with binary numbers.

Illustration: Maura Losch/Axios

Databases storing approximately 550 gigabytes of secret data from a government artificial intelligence contractor were exposed on the internet until the end of last month, according to a report released Tuesday.

Why it matters: Plenty of attention has been given to protecting confidential information from entering AI models, but the new research suggests more focus needs to be given to how AI models' training data itself is stored.

Zoom in: Researchers at cybersecurity company UpGuard discovered that a key government contractor, Veritone, had left two databases exposed on servers hosted on Microsoft's government cloud.

  • Between the two servers, roughly 1.6 billion documents were accessible.
  • The documents included Veritone employee data and credentials, internal system logs, AI training data, and client data from U.S. government customers, including the departments of Homeland Security and Veterans Affairs.
  • Researchers also noticed documents tied to public records requests and police body camera videos.
  • UpGuard found the exposed databases the week of March 23, and Veritone fixed the issue by March 30. It's unclear how long the documents were publicly accessible.

Between the lines: A cloud misconfiguration likely led to the exposures, Greg Pollock, UpGuard's vice president of cyber research, told Axios.

  • Elasticsearch, the vendor that Veritone used to store the data, does not require authentication by default to access data. Not all users know how to set this feature up properly, Pollock said.
  • Microsoft's government cloud services also require a set of authentication configurations, but Veritone's data remained accessible despite those, UpGuard noted.
  • Veritone did not respond to a request for comment.

What they're saying: "Microsoft is providing the government cloud as a service; they're probably not involved in the administration of this database," Pollock said.

  • "It raises the question of, 'Well, should the government cloud do something more?'"

Flashback: In 2021, UpGuard also uncovered a set of misconfigured sites running on Microsoft Power Apps that leaked 38 million records, including COVID-19 tracing data and vaccination appointment details.

The big picture: The acceleration of AI adoption raises the stakes for misconfigured databases beyond just exposed login credentials, Pollock said.

  • Malicious actors can now go a step further and change the training data found in exposed databases like Vertione's, he added.
Go deeper