Nov 14, 2024 - Technology

Exclusive: Anthropic, feds test whether AI will share sensitive nuke info

Sam Sabin

Illustration: Lindsey Bailey/Axios

Anthropic is working with the Department of Energy's nuclear specialists to ensure its models don't help people make weapons, the company first shared with Axios.

Why it matters: Anthropic believes this is the first time a frontier model has been used in a top secret environment, paving the way for similar partnerships with other government agencies.

Zoom in: Anthropic said Thursday that it's been working with DOE's National Nuclear Security Administration since April to "red team" Claude 3 Sonnet to make sure its model wasn't sharing potentially dangerous information about nuclear energy.

In red-teaming exercises, experts test systems by trying to break them.
NNSA has been testing Anthropic's models to see whether people could abuse them to find nefarious use cases for nuclear energy, like developing weapons.

The latest: The pilot program runs through February, allowing NNSA to red team Claude 3.5 Sonnet, which Anthropic unveiled in June.

Anthropic tapped its existing partnership with Amazon Web Services to prepare Claude for government use.

Yes, but: Given the sensitive nature of this testing, Anthropic did not disclose the findings of the pilot program.

The company told Axios it plans to share its findings with scientific labs and other organizations so they can conduct their own testing.

What they're saying: "While U.S. industry leads in frontier model development, the federal government has unique expertise needed to evaluate AI systems for certain national security risks," Anthropic national security policy lead Marina Favaro said in a statement.

"This work will help developers build stronger safeguards for frontier AI systems that advance responsible innovation and American leadership," Favaro added.
Wendin Smith, associate administrator at NNSA and deputy undersecretary for counterterrorism and counterproliferation, said in a statement to Axios that the agency's expertise is "uniquely situated to support this type of evaluation."
"AI is one of those game-changing technologies, and is at the top of the agenda in so many of our conversations," Smith added. "At the same time, there's a national security imperative in evaluating and testing AI's ability to generate outputs that could potentially represent nuclear or radiological risks."

Catch up quick: President Biden issued a national security memorandum last month that called on the Energy Department and other agencies to conduct AI safety tests in classified settings.

Both Anthropic and OpenAI signed agreements with the AI Safety Institute in August to test their models for national security risks ahead of public release.

The intrigue: AI model operators are racing to get government contracts.

Anthropic launched a partnership with Palantir and Amazon Web Services last week to make Claude available to U.S. intelligence agencies.
OpenAI has inked deals with the Treasury Department, NASA and several other agencies.
Scale AI has developed its own model based on Meta's open-source Llama for the defense sector.

What we're watching: As with most projects in Washington right now, it's unclear whether these safety testing partnerships will survive the Trump transition.

Elon Musk, now part of the president-elect's inner circle, has been a wild card on the question of AI safety.
A longstanding concern that AI could endanger humanity's future led Musk to become a cofounder of OpenAI in 2015, and more recently he supported a failed state measure in California to impose tighter safety controls on large models.
At the same time, Musk is developing and promoting his own AI brand and model, xAI — which favors an "anything goes" approach in the name of free speech.

Go deeper: AI safety becomes a partisan battlefield

Editor's note: This story has been corrected to show that the pilot NNSA program runs through February (rather than being extended to February).

Add Axios on Google

Exclusive: Anthropic, feds test whether AI will share sensitive nuke info

What to read next