How Anthropic teaches Claude to be "good"
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Gabriella Turrisi/Axios
Anthropic on Wednesday released an updated "constitution" for Claude, formalizing how the company trains its chatbot to reason about values and behavior as it encounters new, unanticipated situations.
Why it matters: As AI models grow more capable, Anthropic is betting that training systems to reason about values and judgment — not just follow guardrails — will prove safer and more durable than racing to ship faster.
Anthropic's "constitution" — previously referred to internally as the "soul" doc — was written specifically for Claude to define its ethos, Anthropic's Amanda Askell tells Axios. Askell is a member of the company's technical staff, in charge of shaping Claude's character.
- The team developed the document with outside experts in areas where AI models pose higher risks.
- It reflects the company's current thinking, but is written to evolve over time.
The new constitution is more flexible. It's designed to help Claude "behave in a way that's responsible and appropriate given the situation it's in," Askell says.
- Anthropic trains Claude to be "broadly good," but also to reason about what that means across different circumstances.
The big picture: The recent popularity of Anthropic's newest model powering Claude for work and fun may signal growing demand for stronger guardrails.
- The company's emphasis on safety may give Claude a competitive edge over OpenAI's ChatGPT and Google's Gemini.
Between the lines: Anthropic says it wants Claude models to be safe, ethical, compliant with company guidelines, and genuinely helpful.
- Askell says that while ethical disagreement is real, "there is a kind of shared ethics across people and cultures. ... We don't like being manipulated. ... We like being respected. ... We like people looking out for us without being paternalistic."
- The constitution also addressed sycophancy, a chatbot's tendency toward flattery. "It's actually OK ... to say something that's hard to hear. ... It shouldn't be harsh and mean. ... It's a way of exemplifying care for the person," Askell adds.
- Anthropic says those values are embedded directly into Claude's training.
Flashback: Claude's earlier constitution focused on explicit principles and guardrails rather than situational judgment.
- Last year, Anthropic said Claude had begun developing limited introspective abilities, including the capacity to answer questions about its internal state — a shift that raised new questions about how models understand themselves.
The intrigue: Understanding what's "good," depending on the situation may be a bridge too far into robot sentience for some.
- That's part of why Anthropic stopped referring to the document as a "soul" and started calling it a constitution, Askell says.
Yes, but: Teaching AI systems to reason about "goodness," well-being and judgment inevitably raises concerns about anthropomorphism — and about how much moral agency developers are comfortable assigning to their models.
- The document says Anthropic wants Claude to have "equanimity," to "feel free" and "to interpret itself in ways that help it to be stable and existentially secure."
- Axios asked Askell how Anthropic intends to avoid the dystopian arc of the 2013 Spike Jonze film "Her," but she said she hasn't seen it.
The bottom line: The document reflects Anthropic's broader bet: that if powerful AI is inevitable, it's better for safety-focused companies to lead than to step aside.
Editor's note: This article was corrected to note that the 2013 film "Her" was written and directed by Spike Jonze (not Spike Lee).
