Dec 4, 2021 - Technology

AI could end foreign-language subtitles

Illustration of binary numbers inside of speech bubbles.
Illustration: Shoshana Gordon/Axios

AI companies are developing methods to interpret and synthesize voices in ads, movies and TV.

Why it matters: The advances in voice synthesis could help fix bad movie dubbing — and they come as international content is becoming increasingly important to studios and streaming platforms as part of the globalization of entertainment.

  • But they raise concerns about the possibility of deepfaking audio, as well as how a celebrity's voice might be used after their death.

What's happening: Foreign-language hits like "Squid Game" and "La Casa de Papel" are drawing record audiences, but subtitles are still a stumbling block for studios trying to tap a growing international market.

  • More Netflix subscribers watched dubbed versions of "Squid Game" than subtitled versions.
  • With blockbusters sucking up a lot of bandwidth, smaller producers of foreign-language content are having a hard time finding enough interpreters and voice-over actors to meet demand.
  • "We're still stuck in the mindset of the one-to-many broadcasting model," says Ryan Steelberg, co-founder and president of AI company Veritone.

Between the lines: Veritone has developed a product called MARVEL.ai that allows content producers to generate and license what it calls "hyper-realistic" synthetic voices.

  • This means, for example, podcast creators could have audio ad copy interpreted into another language and then MARVEL.ai will generate a synthetic version of their voice reading the ad in the new language.
  • "It gives you the ability to hyper-personalize audio on a much bigger scale and at less cost," says Steelberg.

How it works: Text-to-speech technology has existed for decades, but Veritone's product makes use of "speech-to-speech," what Steelberg calls "voice as a service."

  • Veritone has access to petabytes of data from media libraries and uses that to train its AI product, creating a synthetic version of the original voice that can be tuned for different kinds of sentiment or emotion, or with interpretation, speak a foreign language.
  • "It's no longer going to be another person's new voice speaking on behalf of, say, Tom Cruise," says Steelberg. "It's really going to be Tom Cruise's voice speaking another language."
  • Nvidia has been developing technology that would allow AI to alter video or animation in a way that takes an actor's lips and facial expression and matches it with the new language — so no more out-of-sync dubbing like in 1970s-era kung-fu movies.

What's next: This technology will likely first be used in advertisements, but as it migrates to higher-quality content, it will open up potential opportunities and pitfalls for celebrity talent.

  • "In terms of dubbing and post-production, synthetic voices will become mainstream, and you'll see that built into contracts for talent," says Steelberg.
  • That won't just be to ensure Hollywood stars (and their agents) get a cut for any use of their synthesized voice, but also to prevent those voices from being hijacked for malign purposes as the technology becomes more accessible.

What to watch: How the voices and other creative attributes of deceased celebrities might be harnessed by AI.

  • Holograms of dead musicians like Frank Zappa are already being used to front "live" shows that have brought in tens of millions in revenue, while Kenny G recently released a "duet" with the jazz great Stan Getz, who died 30 years ago.
  • Sample notes from Getz's existing library were used to generate a new, synthetic melody — albeit one that jazz writer Ted Gioia called a "Frankenstein record."

The bottom line: We should get used to hearing celebrities speak in almost any language soon — and those celebrities should get used to going through their wills with a fine-toothed comb.

Go deeper