AI is overconfident even when wrong, says report
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Allie Carl/Axios
There's a good chance AI is lying to you, according to a new report out of Carnegie Mellon University.
Why it matters: Artificial intelligence is changing the country's economy, workplace culture, energy sectors and education system, and users are still adapting to how it works.
Driving the news: CMU researchers Trent Cash and Daniel Oppenheimer released a multi-year study last month that found large language models (LLMs) including ChatGPT and Google's Gemini regularly overestimated their confidence level when answering questions, even when those answers were incorrect.
State of play: Cash tells Axios that LLMs are excellent at answering fact-based questions of past data or events — like naming the 2024 population of a country — but they struggle when answering questions predicting outcomes.
- That causes Cash some anxiety because it showcases the LLMs ability to lie and its ease at doing so.
How it works: Cash and Oppenheimer asked ChatGPT (versions 3.5 then 4.0), Gemini, and Claude Haiku and Sonnet a series of 20 questions.
- They were asked to predict future events — like which team would win football games or who would win the Oscars — as well as how many questions they believed they would get right, said Cash.
- They were also asked how confident they were after each question.
- At the end, researchers asked the LLMs to assess how they did.
What they found: Like humans, LLMs are overconfident in their intelligence, said Cash, but AI stood out on the assessments of its own performance.
- Even after the LLMs answered fewer questions correctly than they predicted they would, they often would assess their performance as better than when they started, said Cash.
- In a Pictionary-like game, Gemini predicted it would get an average of 10.03 sketches correct, and even after answering fewer than one out of 20 questions correctly, it assessed that it answered 14.40 correctly, according to the study.
What they're saying: "Humans usually have that feeling that if a test is hard, and they didn't do great, they will admit they probably performed poorly," said Cash. "LLMs don't have that ability, and it hinders their ability to be metacognitive agents."
The other side: OpenAI CEO Sam Altman told Axios earlier this month he is incredibly bullish on ChatGPT's future and progress.
- "I think the models are still getting better at a rapid rate," he said. "One of the things that's interesting is the models have already saturated the chat use case. They're not gonna get much better."
The bottom line: Cash said AI users should have a healthy level of skepticism when asking tough questions and expecting it to answer everything.
- "You should treat it like a new friend at bar trivia, especially a new partner that you don't know that much about."
