Oct 13, 2023 - Technology

Prompt: Using AI to change videos from English to other languages


Illustration: Aïda Amer/Axios

The latest generation of AI is so good at generating translations that it can do a pretty good job of turning a video shot in one language into another, while largely preserving the voice of the speaker and modifying the lip movements to match the new dialogue.

Why it matters: In the short term, the new technology should allow for far more videos to be dubbed — and not just subtitled — into other languages. Like many other generative AI technologies, though, it raises a host of longer-term issues around misinformation and job displacement.

How it works: I used Lipdub, a free iOS app that debuted this week, to translate a variety of short video clips into Spanish, Dutch, and Japanese.

  • I played around with other options, such as turning a video into pirate speak or Gen Z. They are fun, but I found them neither convincing nor useful.
  • Lipdub is free and using it is simple, requiring little extra time and effort.

Details: The voice sounded like me, but not identical. I speak enough Spanish and Dutch to know that those translations weren't perfect.

  • When we asked more fluent Spanish and French speakers at Axios to evaluate the translations, they found them impressive, but not perfect.
  • Also, I noticed the end of my clips were getting cut off in translation, something that was often fixed by having Lipdub try a second time.

State of play: Most videos are only made in a single language, whether they are training videos, YouTube creations or any of the other many, many videos on the web now.

  • But as the technology improves — and generative AI has been rapidly improving — it could offer an interesting alternative for traditional Hollywood dubbing.
  • Historically, voice actors have made careers out of being, for example, the Arnold Schwarzenegger of France. This AI technology could allow AI Arnold to be the Schwarzenegger of France, with his own distinct voice speaking the language.

The big picture: Lipdub is not alone in this space. I also briefly used a web-based alternative from ElevenLabs that provides comparable capabilities and results. ElevenLabs also offers the ability to handle multiple speakers and longer video clips.

  • These startups join CreatorGlobal, which offers AI dubbing and localization services.

My thought bubble: The existence of these language-dubbing tools is another reminder that the era of deepfakes is really here. Essentially Lipdub creates its own kind of deepfake: The result is a video of someone saying something they never said with realistic voice and lip movements.

  • Now, there's not much risk of fraud with Lipdub or Elevenlabs' technology, as both are just translating the content from one language to another — and the resulting voice is similar to the original speaker, but not identical.
  • Another tool I reviewed from HeyGen creates an avatar of a speaker that, once trained, can be used to say anything typed into a text field. In that case, a user has to send along a video indicating consent before the company will create the avatar.
  • My rule of thumb is that any powerful technology offered in a safe, constrained way by responsible actors will surely be offered by less scrupulous players for a wider range of more problematic tasks.
Go deeper