Startup's AI app dubs videos in different languages
Add Axios as your preferred source to
see more of our stories on Google.

Images courtesy of Captions
A new free app allows people to get their videos dubbed into any of 28 languages — in their own voice and with their lips synchronized to the translated audio.
The big picture: Lipdub, a free iOS app from New York-based AI startup Captions, is the latest demonstration of just how useful generative AI can be — but it also illustrates the technology's immense potential to create realistic looking fakes.
How it works: Lipdub requires people to record a video on their phone with only their face in view. The app uploads the video and returns it to the user in a few minutes with the new language overdub applied.
- For now, Lipdub can handle up to one minute of video from a single speaker.
- In addition to foreign languages, Lipdub has added options like Texas slang and Gen Z, pirate and baby talk.
What they're saying: "Our hope is that the Lipdub technology will remove language barriers and ultimately allow more people to have their stories heard — stories that otherwise might be lost in translation," Captions CEO Gaurav Misra told Axios.
- Misra said Captions is clearly marking videos made using Lipdub as having been generated with AI.
- The 25-person startup first created a video studio app that offers automatic captioning using OpenAI's speech-to-text technology. The company says that product has now been used by more than 100,000 people per day to produce more than 2 million videos each month.
Between the lines: Misra said the hardest part of creating Lipdub was training its algorithms to mirror natural lip movement while using what are known as "zero-shot" models, meaning they don't need to be trained on an individual speaker.
- "As you'll notice, the facial expressions of a person remain unchanged pre- and post-translation — only their lip movements change — resulting in a more natural appearance," Misra added.
What's next: In the future, Misra says generative AI should allow for real-time translation for broadcasts or video conferencing. "Imagine having a Zoom call with someone who doesn't speak the same language, yet understands you perfectly," Misra said.
Go deeper: Watch the demo video, (Really, you should watch it.)
