May 14, 2024 - Technology

ChatGPT's new voice welcomes interruptions

Illustration of a woman with a blinking cursor over and text box her mouth.

Illustration: Allie Carl/Axios

The new voice assistant version of ChatGPT that OpenAI demonstrated Monday jokes, chides, apologizes, pretends to blush — and knows how to deal with interruptions.

Why it matters: ChatGPT-4o shows off a new level of real-time conversational fluency — including the ability to understand context and shift gears when people talk over it — that AI assistants will need to win over the world's users.

Driving the news: The new ChatGPT-4o (that's a lowercase "o," for "omni") spent 15 minutes of livestreamed demo time with OpenAI leaders in front of an in-person audience of the company's employees.

  • The bot spoke in a sprightly female voice, responding far faster to queries than previous generations of voice-bots, with more nuanced human inflection and better mimicked human emotion.

The big picture: The advances shown by OpenAI make the last generation of assistants — including Apple's Siri, Amazon's Alexa and Google Assistant — seem outdated.

Zoom in: In OpenAI's demos, the company showed the latest ChatGPT reading a bedtime story with increasing levels of dramatic excitement, as well as in a faux robot voice.

  • That's a huge improvement from the days, not that long ago, when a robotic voice was all you could get from a computer.
  • At one point in the demo, OpenAI head of frontiers research Mark Chen asked ChatGPT for tips to calm his nerves, and the chatbot suggested deep breaths. When Chen responded by hyperventilating, ChatGPT replied, "Whoa, slow down a little bit there, Mark — you're not a vacuum cleaner!"
  • The new ChatGPT can also carry a tune: Though its singing chops aren't yet up to "American Idol" level, they far eclipse my own capabilities.

How it works: Chen said a key breakthrough in improving the bot's voice mode came when the company consolidated the work that had previously been done by several separate models.

  • "We trained, end-to-end, one model that could take speech in and also produce speech out," Chen told Axios. That helps speed responses, "and also the emotion doesn't get lost in this pipeline."

Between the lines: The new chatbot gave off strong "Her" vibes, resembling Scarlett Johansson's AI assistant in the 2013 film, many observers inside and outside OpenAI agreed.

  • "Technology aside, OpenAI remains so good at productizing this all in a way that gets people excited and wanting to use it," investor and writer MG Siegler posted on X. "They knew the assignment. The assignment was to create a real-life version of Samantha from 'Her'. And they did it."

Yes, but: Others were quick to point out that "Her" did not have a happy ending.

  • "'Her' is a great movie and I think everyone at OpenAI should watch it one more time all the way through to the end," wrote Wired's Brian Barrett.

The intrigue: When a wide swath of mischievous humans start engaging with ChatGPT-4o, it will be sure to throw some wild swings, the way the original ChatGPT famously did.

  • Even in the tightly controlled environment of a product rollout, as the OpenAI team tried to end the demo, ChatGPT piped up, out of nowhere, with the line, "Wow, that's quite the outfit you have on!"
  • OpenAI's developers ignored the come-on, but you can bet plenty of users won't.

Our thought bubble: Making voice assistants friendly makes sense, but OpenAI seems to be deliberately aiming for a level of warmth that could get messy very quickly — for both users and the company.

OpenAI CTO Mira Murati acknowledged that ChatGPT-4o needs more testing and improvements.

  • "We've been doing a ton of red teaming," Murati told Axios. "But of course, you need to broaden up access and see what weird things people do with it."

What's next: OpenAI made the text and image capabilities of ChatGPT-4o available to some customers immediately, while saying an "alpha" version of the improved voice mode will be released to paid ChatGPT Plus subscribers "in the coming weeks."

Go deeper