AI is still getting things wrong, more confidently than ever
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Brendan Lynch/Axios
AI tools might be hallucinating less, but they're still spitting out inaccurate answers cloaked in polished, hyper-confident language.
Why it matters: The more people trust AI, the less likely they are to catch costly mistakes. It's a growing problem as people increasingly lean on the technology for research, medical advice and schoolwork.
The big picture: Obvious hallucinations are easy to catch. The real trouble comes from false answers that sound convincing.
- Plausible citations, mostly-correct summaries, and confidently wrong answers slip past users.
- If AI becomes accurate enough often enough, people might stop fact-checking altogether.
State of play: AI boosters continue to insist that there should always be "a human in the loop."
- But in the age of autonomous agents, it's becoming unclear what the loop is and where exactly humans fit into it.
Driving the news: New research suggests AI note-taking tools (often called AI scribes) can help in medical settings, but only in tandem with professional reviewers.
- A Yale School of Medicine study this month found that first-year medical students who revised their own clinical notes with AI-generated drafts generally maintained note quality.
- But the AI notes themselves often omitted important details, including symptom duration.
- Two-thirds of students said the notes were "helpful as a first draft," but 21% said the note taker "may reduce my ability to learn how to write a good note."
Yes, but: AI scribes are moving quickly into health care, despite concerns that automated clinical notes can omit, misstate or fabricate details.
What they're saying: Fewer hallucinations don't comfort Dan Klein, a UC Berkeley professor and co-founder and CTO of Scaled Cognition.
- "When you hear that the iceberg is mostly under the water, you don't feel better," Klein tells Axios.
- "These systems, they're not truth engines," Klein says. "They're plausibility engines." Their creators optimize for things like speed, user satisfaction, helpfulness and task completion.
- None of those is the same as truth.
- "If you tell [AI models] anything other than 'optimize for truth,' you're going to erode the truth," Klein says.
The intrigue: A Harvard study found that when Boston Consulting Group professionals attempted to expose mistakes in gen AI output, the model responded not with contrition and correction, but with "persuasion bombing."
- The more the humans pushed back on answers they believed were wrong, the more the AI tried various persuasion techniques like flattery.
- Anyone who has ever pointed out an error to a chatbot is probably familiar with the "you're exactly right, I got that wrong" response, which is often followed by another error.
Between the lines: AI companies have spent years trying to reduce hallucinations with processes like retrieval-augmented generation (RAG), or grounding answers in relevant documents or data.
- This has begun to produce greater accuracy, but not 100%.
- The basic user experience still encourages people to assume polished answers are correct.
AI experts aren't immune to falling for false responses.
- Last week, The New York Times found several confabulated or misattributed quotes in the "The Future of Truth," a book about how AI reshapes reality.
- Earlier this year, AI reporter Benj Edwards was fired for publishing AI-hallucinated quotes in a story about a rogue AI agent.
Zoom in: Double checking AI outputs can suck up the hours saved using the tools in the first place. As workers are increasingly asked to do more, they're bound to start cutting corners somewhere.
- In a March 2026 paper, researchers found that the primary driver of service workers neglecting to validate AI-generated content was that no one was paying attention to the errors.
What we're watching: Whether organizations will install guardrails that ensure employees review AI-generated output, or if truth will simply begin to matter less.
