Oct 10, 2023 - Technology

Startup's AI app dubs videos in different languages

headshot
Screenshots of Lipdub an AI tool for automatic language translation

Images courtesy of Captions

A new free app allows people to get their videos dubbed into any of 28 languages — in their own voice and with their lips synchronized to the translated audio.

The big picture: Lipdub, a free iOS app from New York-based AI startup Captions, is the latest demonstration of just how useful generative AI can be — but it also illustrates the technology's immense potential to create realistic looking fakes.

How it works: Lipdub requires people to record a video on their phone with only their face in view. The app uploads the video and returns it to the user in a few minutes with the new language overdub applied.

  • For now, Lipdub can handle up to one minute of video from a single speaker.
  • In addition to foreign languages, Lipdub has added options like Texas slang and Gen Z, pirate and baby talk.

What they're saying: "Our hope is that the Lipdub technology will remove language barriers and ultimately allow more people to have their stories heard — stories that otherwise might be lost in translation," Captions CEO Gaurav Misra told Axios.

  • Misra said Captions is clearly marking videos made using Lipdub as having been generated with AI.
  • The 25-person startup first created a video studio app that offers automatic captioning using OpenAI's speech-to-text technology. The company says that product has now been used by more than 100,000 people per day to produce more than 2 million videos each month.

Between the lines: Misra said the hardest part of creating Lipdub was training its algorithms to mirror natural lip movement while using what are known as "zero-shot" models, meaning they don't need to be trained on an individual speaker.

  • "As you'll notice, the facial expressions of a person remain unchanged pre- and post-translation — only their lip movements change — resulting in a more natural appearance," Misra added.

What's next: In the future, Misra says generative AI should allow for real-time translation for broadcasts or video conferencing. "Imagine having a Zoom call with someone who doesn't speak the same language, yet understands you perfectly," Misra said.

Go deeper: Watch the demo video, (Really, you should watch it.)

Go deeper