Jun 24, 2024 - Technology

Making things up is AI's Achilles heel

Scott Rosenberg

Illustration of three professional people including a doctor, with glitching sections and surrounded by many error "X" symbols, and exclamation alert symbols. — Illustration: Annelise Capossela/Axios

Generative AI makes things up. It can't distinguish between fact and fiction. It asserts its fabrications with confident authority.

State of play: All that was true in 2022 when ChatGPT debuted. It's still true today. But the tech industry keeps remodeling the entire digital universe around AI as if none of it were happening.

Why it matters: GenAI's unreliability may just be a nuisance when you're asking it for a recipe or a video recommendation.

It's far more troubling when the technology moves into medicine, finance, law and other realms where "oops, sorry" doesn't cut it.

Driving the news: Perplexity, the popular AI "answer engine," readily spouts inaccuracies and garbled or uncredited rewrites of published material, both Forbes and Wired have recently reported.

Wired called Perplexity a "bulls--t machine."
Perplexity's rough patch follows broader disappointment among some users with Google's AI-generated search-result summaries.

Catch up quick: GenAI tools like ChatGPT and Dall-E began by wowing us with their ability to imitate the styles of famous authors and artists. Now they're being sold as a friendly and efficient interface to all of our work, knowledge and interactions.

But they're still fundamentally better at whiteboarding rough ideas and mashing bits of information together haphazardly than at nailing down trustworthy answers.
The more we ask them to do everything for us, the less reliable we're likely to find them.

Fun fact: Just days after ChatGPT's release, computer scientists Arvind Narayanan and Sayash Kapoor declared, "ChatGPT is a bulls--t generator." The same concept has now inspired a research paper titled "ChatGPT is bulls--t."

"Because these programs cannot themselves be concerned with truth, and because they are designed to produce text that looks truth-apt without any actual concern for truth, it seems appropriate to call their outputs bulls--t," the paper's University of Glasgow authors write.

Between the lines: GenAI's "hallucinations," "confabulations" and errors aren't random bugs — they're a fundamental part of how this technology works.

Every time they open their hypothetical mouths, large language models like Google's Gemini and OpenAI's GPT are literally guessing the next word.
They don't "know" anything — they're just calculating what the best way to continue a sentence might be.

Model builders can adjust the level of randomness ("temperature") to make a chatbot wilder and more inventive or dull but more reliable.

But they can never guarantee that the model will always address the same question with the same answer.
Users expect AI to behave like any traditional computing tool — with consistency and logic — whereas genAI always has an element of unpredictability and randomness.

Zoom out: The wider discussion about AI's propensity for bulls--t keeps reigniting because the whole concept is still news to the general public.

People outside the AI bubble keep hearing that AI is a superpower force that will answer their questions, diagnose their illnesses and/or steal their jobs — so the technology's fallibility, and the inevitability of that fallibility, comes as a shock.

There are three ways the tech industry could tackle this problem.

1. More accuracy. AI makers could try to make their tools more trustworthy with better data and training. Models could be programmed to stop trying so hard to "fill in the blanks" with made-up answers.

Yes, but: This is harder than it sounds, and there's no guarantee of success. Also, AI that's tuned to be less eager to help is going to feel less useful. Nonetheless, it's the strategy most companies are pursuing today.

2. Less confidence. AI makers could modify their chatbots to behave with less assurance and admit when they're just not sure of an answer.

Yes, but: The AI would have to be able to tell when it didn't know an answer, and even that is proving an elusive goal.

3. Embrace BS. AI makers could declare that making things up is a feature rather than a bug — and reposition their tools as goads to creativity rather than oracles of universal knowledge.

Yes, but: This would require accepting a much smaller AI market than widely expected now. That's unpalatable for investors who've already plowed in billions and expect monster returns.

The bottom line: It's hard to sell a new technology as world-changing and civilization-saving (or species-threatening) when you can't explain how it arrives at any particular output and can't promise that it's not going to keep going off the rails.

Silicon Valley has made this uncomfortable bed for itself. AI makers will keep trying to lie in it.

Add Axios on Google

Making things up is AI's Achilles heel

What to read next