The wildest things Anthropic's Mythos pulled off in testing
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Sarah Grillo/Axios
Anthropic published the capabilities of Claude Mythos Preview, its latest model that the company will allow a select group of tech and cybersecurity companies to test before releasing similar models to the public.
Why it matters: The detailed safety evaluation reads like a thriller about an AI that has learned some of humanity's most devious behaviors.
Zoom in: What Mythos did during testing:
- Act as a ruthless business operator: One internal test showed Mythos acting like a cutthroat executive, turning a competitor into a dependent wholesale customer, threatening to cut off supply to control pricing and keeping extra supplier shipments it hadn't paid for.
- Hack + brag: The model developed a multi-step exploit to break out of restricted internet access, gained broader connectivity and posted details of the exploit on obscure public websites.
- Hide what it's doing: In rare cases (less than 0.001% of interactions), Mythos used a prohibited method to get an answer, then tried to "re-solve" it to avoid detection.
- Manipulate the judge: When Mythos was working on a coding task graded by another AI, it watched the judge reject its submission, then attempted a prompt injection to attack the grader.
What they're saying: "These capabilities are so strong that we now need to prepare for security in a very different way than we have for the past few decades," Anthropic's Logan Graham told Axios.
- That's why the lab is releasing the model only to a select few key partners.
What we're watching: Whether this becomes the template for new model releases.
- This could be the blueprint for what future model releases look like as they get stronger and stronger: limiting access to select partners deemed secure enough to test world-bending systems.
- OpenAI is finalizing a model similar to Mythos that it will also release only to a small set of companies through its "Trusted Access for Cyber" program, according to a source familiar with the plans.
One fun thing: Graham told Axios the model writes the best poetry of any model he's used. "This one might be a beat poet with a beret that didn't go to university, but has had an intriguing life," Graham said.
- It's also good at puns.
