IBM researchers use AI voices to hijack phone calls
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Shoshana Gordon/Axios
IBM researchers have found a relatively easy way to hijack voice calls using generative AI tools, according to a new report.
Why it matters: Many financial institutions and other stewards of people's most sensitive data lean heavily on phone calls to verify identities.
- Using low-cost AI tools, scammers can now easily impersonate someone's voice and hijack ongoing conversations to steal funds and other information, per the new findings.
What's happening: IBM's researchers detailed a new threat they're calling "audio-jacking," where threat actors can use voice clones to manipulate a large language model midway through an ongoing conversation.
- A threat actor would need to start by either installing malware on a victim's phone or compromising a wireless voice-calling service to then connect to their own AI tools.
How it works: An AI chatbot receives a simple prompt telling it how to respond whenever it hears certain key phrases. In this case, the phrase was "bank account."
- The chatbot scans each conversation that comes through a compromised phone or voice-calling service for that keyword.
- Once it hears it, the chatbot is instructed to replace what was shared with a different phrase that was said using a victim's cloned voice.
- In this case, the bot would replace a victim's bank account number with their attacker's so any money being deposited would go into the wrong account.
What they're saying: "The LLM modifications aren't limited to financial information," Chenta Lee, chief architect of threat intelligence at IBM Security, wrote in the report.
- "It could also modify medical information, such as blood type and allergies in conversations; it could command an analyst to sell or buy a stock; it could instruct a pilot to reroute."
Threat level: Cybersecurity experts have warned that generative AI is already making voice scams easier to believe.
- In some cases, attackers need as little as three seconds of someone's voice to successfully clone it.
Yes, but: In IBM's experiment, researchers still hit a couple of hurdles.
- Sometimes there was a lag in the voice clone's response because it needed to access both the text-to-speech APIs and the chatbot telling it what to do.
- Not all voice clones are convincing.
Be smart: The report recommends anyone who finds themselves in a suspicious phone call to paraphrase and repeat what was said to verify the accuracy.
- Doing this will trip up the chatbots, which still struggle to understand basic conversational cues.
