AI companies are developing methods to translate and synthesize voices in ads, movies and TV.
Why it matters: The advances in voice synthesis could help fix bad movie dubbing — and they come as international content is becoming increasingly important to studios and streaming platforms as part of the globalization of entertainment.
- But they raise concerns about the possibility of deepfaking audio, as well as how a celebrity's voice might be used after their death.
What's happening: Foreign-language hits like "Squid Game" and "La Casa de Papel" are drawing record audiences, but subtitles are still a stumbling block for studios trying to tap a growing international market.
- More Netflix subscribers watched dubbed versions of "Squid Game" than subtitled versions.
- With blockbusters sucking up a lot of bandwidth, smaller producers of foreign-language content are having a hard time finding enough translators and voice-over actors to meet demand.
- "We're still stuck in the mindset of the one-to-many broadcasting model," says Ryan Steelberg, co-founder and president of AI company Veritone.
Between the lines: Veritone has developed a product called MARVEL.ai that allows content producers to generate and license what it calls "hyper-realistic" synthetic voices.
- This means, for example, a podcast creator could have audio ad copy translated into another language and then MARVEL.ai will generate a synthetic version of their voice reading the ad in the new language.
- "It gives you the ability to hyper-personalize audio on a much bigger scale and at less cost," says Steelberg.
How it works: Text-to-speech technology has existed for decades, but Veritone's product makes use of "speech-to-speech," what Steelberg calls "voice as a service."
- Veritone has access to petabytes of data from media libraries and uses that to train its AI product, creating a synthetic version of the original voice that can be tuned for different kinds of sentiment or emotion, or with translation, speak a foreign language.
- "It's no longer going to be another person's new voice speaking on behalf of, say, Tom Cruise," says Steelberg. "It's really going to be Tom Cruise's voice speaking another language."
- Nvidia has been developing technology that would allow AI to alter video or animation in a way that takes an actor's lips and facial expression and matches it with the new language — so no more out-of-sync dubbing like in 1970s-era kung-fu movies.
What's next: This technology will likely first be used in advertisements, but as it migrates to higher-quality content, it will open up potential opportunities and pitfalls for celebrity talent.
- "In terms of dubbing and post-production, synthetic voices will become mainstream, and you'll see that built into contracts for talent," says Steelberg.
- That won't just be to ensure Hollywood stars (and their agents) get a cut for any use of their synthesized voice, but also to prevent those voices from being hijacked for malign purposes as the technology becomes more accessible.
What to watch: How the voices and other creative attributes of deceased celebrities might be harnessed by AI.
- Holograms of dead musicians like Frank Zappa are already being used to front "live" shows that have brought in tens of millions in revenue, while Kenny G recently released a "duet" with the jazz great Stan Getz, who died 30 years ago.
- Sample notes from Getz's existing library were used to generate a new, synthetic melody — albeit one that jazz writer Ted Gioia called a "Frankenstein record."
The bottom line: We should get used to hearing celebrities speak in almost any language soon — and those celebrities should get used to going through their wills with a fine-toothed comb.
Share this story.