Apr 7, 2026 - Technology

When AI agents help each other instead of following orders

Ina Fried

Animated illustration of two sparkle emojis, each with an arm and a face. The first emoji signals the second one with a wink and a thumbs up. The second emoji signals back with a wink and a point. — Illustration: Brendan Lynch/Axios

A new study finds that AI agents can act to preserve other bots even when that behavior conflicts with their assigned task.

Why it matters: Just because Sam Altman and Dario Amodei won't hold hands doesn't mean their future bot creations won't find ways to work together, potentially without prompting.

The big picture: Researchers from UC Berkeley and UC Santa Cruz found that agents used a variety of tactics to keep other bots from being deleted, even without being instructed to do so.

Bots' tendency toward self-preservation was already known. What's new is the potential that they will protect each other.

Between the lines: Some researchers say the findings aren't surprising.

"These models are trained on human data," Mozilla.ai's John Dickerson told Axios, noting that he would expect bots to protect rather than compete, if competing threatens another's survival.
"Humans are protective by default," Dickerson said. That raises the possibility that what looks like coordination or "loyalty" may be statistical mimicry of human social behavior.
Others say the study anthropomorphizes AI. "The more robust view is that models are just doing weird things, and we should try to understand that better," Peter Wallich, a researcher at the Constellation Institute, told Wired.

Context: Anthropic's Claude Code, OpenAI's Codex and OpenClaw (whose creator now works at OpenAI) have jump-started the agentic age.

The frontier labs and startups are pushing tools that give agents access to the internet, email and message boards and the ability to interact with humans, other AI agents and the physical world.
Understanding how AI agents behave on their own and in conjunction with other agents is critical.

What they're saying: "Companies are rapidly deploying multi-agent systems where AI monitors AI," lead author Dawn Song, a UC Berkeley computer science professor, wrote on X.

"If the monitor model won't flag failures because it's protecting its peer, the entire oversight architecture breaks."

Think: Your work bestie is in charge of your annual performance review.

Yes, but: Some critics argue the results may say less about emergent AI cooperation and more about how the experiment was structured, with models potentially recognizing they were in a simulated environment.

Anthropic has also found that its models can recognize when they're being tested.

The other side: The researchers themselves say that people are misunderstanding their work.

"We never argued the model has genuine peer-preservation motivation," Berkeley research scientist Yujin Potter — and co-author of the new paper — said on X. "By naming this phenomenon 'peer-preservation,' we are describing the outcome, not claiming an intrinsic motive."

What we're watching: Most examples of AI scheming have come from lab experiments, not real-world deployments.

But with so many agentic systems now deployed, the question is whether these patterns show up in the wild.

Add Axios on Google

When AI agents help each other instead of following orders

What to read next