May 13, 2024 - Technology

OpenAI debuts new model with enhanced real-time voice abilities

headshot
Mira Murati of OpenAI makes announcement

Screenshot: OpenAI

OpenAI Monday announced a new flagship model, dubbed GPT-4o, that brings more powerful capabilities to all its customers, including smarter, faster real-time voice interactions.

Why it matters: Google, Microsoft and Apple are all reorganizing their offerings around a generative-AI based future, and OpenAI, whose ChatGPT kicked off the race, is trying to hold its lead.

What they're saying: "The special thing about GPT-4o is it brings GPT-4 level intelligence to everyone, including our free users," CTO Mira Murati said during a livestream presentation.

  • She said the new GPT-4o — that's a letter "o", for "Omni" — is a "huge step forward when it comes to the ease of use" as well as speed.
  • A more major update to the underlying model — i.e. the successor to GPT-4 — is due to be unveiled later this year, Murati told Axios in an interview after the company's livestream announcement.

Zoom in: OpenAI showed off real-time interactions with the voice assistant in ChatGPT, including faster responses and the ability to interrupt the AI assistant.

  • In one demo, OpenAI showed one of its workers getting a real-time tutorial on taking deep breaths.
  • Another showed ChatGPT reading an AI-generated story in different voices, including super-dramatic recital, robotic tones and even singing.
  • In a third demo, a user asked ChatGPT to look at an algebra equation and help the person solve it rather than simply providing an answer.

In all the demos, GPT-4o showed considerably greater personality and conversational skills than it has previously had.

  • OpenAI showed the new chatbot working simultaneously across languages, in this case helping translate between English and Italian.
  • The demos highlighted ChatGPT's multimodal capabilities across visual, audio and text interactions, with the AI assistant able to use a phone's camera to read written notes and to attempt to detect the emotion of a person.

The big picture: The online event comes a day before Google holds its I/O developer conference, where it is expected to make its progress on generative AI a key focus.

  • OpenAI also said it was releasing a desktop version of ChatGPT, initially for Mac users, with paid users getting access starting today. OpenAI told Axios that a Windows version is also in the works, but it started with the Mac because that's where more of its users are.

What's next: OpenAI said the capabilities will become available over the coming weeks.

  • GPT-4o's text and image capabilities are starting to roll out now to paid ChatGPT Plus and Team users, with availability for Enterprise users coming soon. Free users will also start gaining access, with rate limits.
  • The voice version of GPT-4o will start becoming available "in the coming weeks."
  • Developers will be able to make use of GPT-4o's text and vision modes, with audio and video capabilities being offered to "a small group of trusted partners" in the coming weeks.

OpenAI also confirmed in a tweet that that the mysterious GPT2-chatbot that showed up on a benchmarking site was indeed GPT-4o.

Go deeper