Oct 10, 2023 - Technology

Startup's AI app dubs videos in different languages

Ina Fried

Screenshots of Lipdub an AI tool for automatic language translation — Images courtesy of Captions

A new free app allows people to get their videos dubbed into any of 28 languages — in their own voice and with their lips synchronized to the translated audio.

The big picture: Lipdub, a free iOS app from New York-based AI startup Captions, is the latest demonstration of just how useful generative AI can be — but it also illustrates the technology's immense potential to create realistic looking fakes.

How it works: Lipdub requires people to record a video on their phone with only their face in view. The app uploads the video and returns it to the user in a few minutes with the new language overdub applied.

For now, Lipdub can handle up to one minute of video from a single speaker.
In addition to foreign languages, Lipdub has added options like Texas slang and Gen Z, pirate and baby talk.

What they're saying: "Our hope is that the Lipdub technology will remove language barriers and ultimately allow more people to have their stories heard — stories that otherwise might be lost in translation," Captions CEO Gaurav Misra told Axios.

Misra said Captions is clearly marking videos made using Lipdub as having been generated with AI.
The 25-person startup first created a video studio app that offers automatic captioning using OpenAI's speech-to-text technology. The company says that product has now been used by more than 100,000 people per day to produce more than 2 million videos each month.

Between the lines: Misra said the hardest part of creating Lipdub was training its algorithms to mirror natural lip movement while using what are known as "zero-shot" models, meaning they don't need to be trained on an individual speaker.

"As you'll notice, the facial expressions of a person remain unchanged pre- and post-translation — only their lip movements change — resulting in a more natural appearance," Misra added.

What's next: In the future, Misra says generative AI should allow for real-time translation for broadcasts or video conferencing. "Imagine having a Zoom call with someone who doesn't speak the same language, yet understands you perfectly," Misra said.

Go deeper: Watch the demo video, (Really, you should watch it.)

Add Axios on Google

Startup's AI app dubs videos in different languages

What to read next