1 big thing: AI voice impersonators
Big Tech, top university labs and the U.S. military are pouring effort and money into detecting deepfake videos — AI-edited clips that can make it look like someone is saying something they never uttered. But video's forgotten step-sibling, deepfake audio has attracted considerably less attention — despite a comparable potential for harm.
Kaveh writes: With video deepfakes, defenders are playing the cat to a fast-scurrying mouse: AI-generated video is getting quite good. The technology to create audio fakes, by contrast, is not as advanced — but experts say that's soon to change.
- "In a couple years, having a voice [that mimics] an individual and can speak any words we want it to speak — this will probably be a reality," Siwei Lyu, director of SUNY Albany's machine learning lab, tells Axios.
- "But we have a rare opportunity before the problem is a reality when we can grow the forensic technology alongside the synthesis technology," says Lyu, who participates in DARPA's Media Forensics program.
Why it matters: Experts worry that easily faked but convincing AI impersonations can turn society on its head — running rampant fake news, empowering criminals, and giving political opponents and foreign provocateurs tools to sow electoral chaos.
- In the U.S., fake audio is most likely to supercharge political mayhem, spam calls and white-collar crime.
- But in places where fake news is already spreading disastrously on Telegram and WhatsApp (think India or Brazil), a persuasive tape of a leader saying something incendiary is especially perilous, says Sam Gregory of Witness, a human-rights nonprofit.
There are two main ways to use AI to forge audio:
- Modulation, which changes the quality of a voice to make it sound like someone else — from male to female, or British to American, for example. Boston-area startup Modulate.ai does this, as have researchers from China's Baidu.
- Synthesis, in which AI speaks any phrase typed into a box with a specific voice — like Trump's, for example. Montreal's Lyrebird can do this, as can Adobe's yet-unreleased VoCo, which can also rearrange, add or subtract words in an existing recording to make it sound completely different.
Detecting audio deepfakes requires training a computer to listen for inaudible hints that the voice couldn't have come from an actual person. Lyu and UC Berkeley's Hany Farid are researching automated ways to do this.
- Google recently made a vast dataset of its own synthetic speech available to researchers who are working on deepfake detection. This trove of training data can help AI systems find and recognize the hallmarks of fake voices.
- For an international competition, 49 teams submitted deepfake detectors trained with Google's contribution, plus voices from 19 other sources in various languages. The top entrants were highly accurate, said competition co-organizer Junichi Yamagishi, a researcher at Japan's National Institute of Informatics. The best system only made mistakes 0.22% of the time, he tells Axios.
Pindrop, an Atlanta company that sells voice authentication to big banks and insurance companies, is also developing defenses, worried that the next wave of attacks on its clients will involve deepfake audio.
- One key to detecting fakes, according to the company: sounds that seem normal, but that people aren't physically capable of making.
- An example from Pindrop CEO Vijay Balasubramaniyan: If you say "Hello, Paul," your mouth can only shift from the "o" to "Paul" at a certain speed. Spoken too fast, "the only way to say this is with a 7-foot-tall neck," Balasubramaniyan says.
The bottom line: If deepfake detectors can get out ahead of the spread of fake audio, they could contain the potential fallout. And, unlike with video, it looks like the defenders could actually keep up with the forgers.
2. Audio deepfakes: a progress report
If you want to make a video deepfake, you can download free software and create it yourself. Someone with a bit of savvy and a chunk of time can churn out side-splitters like this one.
Kaveh writes: Not so for audio deepfakes — at least not yet. Good synthetic audio is still the domain of startups, Big Tech and academic research.
What's happening: Pindrop, the audio biometrics company, is developing synthetic voices in order to train its own defenses to detect them. Vijay Balasubramaniyan, Pindrop's CEO, shared several fake voices with Axios.
- Listen to one of the voices.
- The AI-generated voice is clearly mimicking Ellen DeGeneres — but it’s not quite right.
How it works: Pindrop's system listened to countless hours of DeGeneres talking in real life — mostly narrating her own audiobooks — and then used a cutting-edge AI technique to develop an impersonator, improving the synthetic voice until the system could no longer tell it apart from the real thing. Now, anyone can type a phrase into the system and have it read out in DeGeneres' voice.
Kaveh listened to this and several other Pindrop-generated voices. Each captured the real speakers' idiosyncrasies, but they were exposed by their robotic-sounding pace and cadence. To this, Balasubramaniyan replied:
"You are actually identifying all the things it takes to start mimicking a million years of human evolution in voice. Our synthesis systems do a good job at synthesizing a voice but not yet things like cadence, emotion and flair, which are all active areas of research."
But that doesn't mean these imperfect fakes couldn't cause some mischief now. Imagine if you were already expecting to receive a phone call from someone. You probably wouldn't be too suspicious if he sounded a bit robotic or stilted if he told you he was sick and driving through a tunnel.
- "We're communicating through this phone system that has a lot of security issues," says Aviv Ovadya, a misinformation researcher and founder of the Thoughtful Technology Project.
- That's how Charlie Warzel, formerly of BuzzFeed News, tricked his own mother into falling for an AI mimicry of his voice.
3. Living alone
Staggering stat: The number of single-person households in the world will jump 128% between 2000 and 2030, according to a new report from Euromonitor International.
- Erica writes: The report also projects a steep rise in the number of single-parent households, driven by a 79% rise in the number of divorces around the world between 2000 and 2030.
What to watch: The rise of childlessness and fewer people per home is teeing up an apartment boom in big cities like Tokyo, London, New York and Shanghai, the report says. Expect to see more micro-apartments, which are already taking off in Japan.
4. Worthy of your time
Japan's new immigration push (Ryosuke Eguchi — Nikkei Asian Review)
The privatization of identity (Felix Salmon — Axios)
Junk and the future of space (Mark Harris — MIT Tech Review)
WeWork: The high school cafeteria all over again (Ellen Gamerman — WSJ)
Burger King's beefless whopper (Nathaniel Popper — NYT)
5. 1 blast from the past: The last Blockbuster on Earth
If you want to run the last Blockbuster left on Earth, you've got to know how to use floppy disks. That's how they reboot the computer system at the final location of the once-ubiquitous chain.
Erica writes: Blockbuster filed for bankruptcy in 2010, and the video rental chain — which once boasted 9,000 worldwide stores — shrunk to one in Bend, Oregon, AP reports.
- “It’s pure stubbornness, for one. We didn’t want to give in. We did everything we could to cut costs and keep ourselves relevant," the store's general manager, Sandi Harding, told AP.
- The store has leaned into its status as the final holdout and sells stickers, mugs and T-shirts emblazoned with "The Last Blockbuster on the Planet."