Feb 2, 2024 - Technology

IBM researchers use AI voices to hijack phone calls

Sam Sabin

Illustration of binary numbers inside of speech bubbles. — Illustration: Shoshana Gordon/Axios

IBM researchers have found a relatively easy way to hijack voice calls using generative AI tools, according to a new report.

Why it matters: Many financial institutions and other stewards of people's most sensitive data lean heavily on phone calls to verify identities.

Using low-cost AI tools, scammers can now easily impersonate someone's voice and hijack ongoing conversations to steal funds and other information, per the new findings.

What's happening: IBM's researchers detailed a new threat they're calling "audio-jacking," where threat actors can use voice clones to manipulate a large language model midway through an ongoing conversation.

A threat actor would need to start by either installing malware on a victim's phone or compromising a wireless voice-calling service to then connect to their own AI tools.

How it works: An AI chatbot receives a simple prompt telling it how to respond whenever it hears certain key phrases. In this case, the phrase was "bank account."

The chatbot scans each conversation that comes through a compromised phone or voice-calling service for that keyword.
Once it hears it, the chatbot is instructed to replace what was shared with a different phrase that was said using a victim's cloned voice.
In this case, the bot would replace a victim's bank account number with their attacker's so any money being deposited would go into the wrong account.

What they're saying: "The LLM modifications aren't limited to financial information," Chenta Lee, chief architect of threat intelligence at IBM Security, wrote in the report.

"It could also modify medical information, such as blood type and allergies in conversations; it could command an analyst to sell or buy a stock; it could instruct a pilot to reroute."

Threat level: Cybersecurity experts have warned that generative AI is already making voice scams easier to believe.

In some cases, attackers need as little as three seconds of someone's voice to successfully clone it.

Yes, but: In IBM's experiment, researchers still hit a couple of hurdles.

Sometimes there was a lag in the voice clone's response because it needed to access both the text-to-speech APIs and the chatbot telling it what to do.
Not all voice clones are convincing.

Be smart: The report recommends anyone who finds themselves in a suspicious phone call to paraphrase and repeat what was said to verify the accuracy.

Doing this will trip up the chatbots, which still struggle to understand basic conversational cues.

Add Axios on Google

IBM researchers use AI voices to hijack phone calls

What to read next