Feb 2, 2024 - Technology

IBM researchers use AI voices to hijack phone calls

Illustration of binary numbers inside of speech bubbles.

Illustration: Shoshana Gordon/Axios

IBM researchers have found a relatively easy way to hijack voice calls using generative AI tools, according to a new report.

Why it matters: Many financial institutions and other stewards of people's most sensitive data lean heavily on phone calls to verify identities.

  • Using low-cost AI tools, scammers can now easily impersonate someone's voice and hijack ongoing conversations to steal funds and other information, per the new findings.

What's happening: IBM's researchers detailed a new threat they're calling "audio-jacking," where threat actors can use voice clones to manipulate a large language model midway through an ongoing conversation.

  • A threat actor would need to start by either installing malware on a victim's phone or compromising a wireless voice-calling service to then connect to their own AI tools.

How it works: An AI chatbot receives a simple prompt telling it how to respond whenever it hears certain key phrases. In this case, the phrase was "bank account."

  • The chatbot scans each conversation that comes through a compromised phone or voice-calling service for that keyword.
  • Once it hears it, the chatbot is instructed to replace what was shared with a different phrase that was said using a victim's cloned voice.
  • In this case, the bot would replace a victim's bank account number with their attacker's so any money being deposited would go into the wrong account.

What they're saying: "The LLM modifications aren't limited to financial information," Chenta Lee, chief architect of threat intelligence at IBM Security, wrote in the report.

  • "It could also modify medical information, such as blood type and allergies in conversations; it could command an analyst to sell or buy a stock; it could instruct a pilot to reroute."

Threat level: Cybersecurity experts have warned that generative AI is already making voice scams easier to believe.

  • In some cases, attackers need as little as three seconds of someone's voice to successfully clone it.

Yes, but: In IBM's experiment, researchers still hit a couple of hurdles.

  • Sometimes there was a lag in the voice clone's response because it needed to access both the text-to-speech APIs and the chatbot telling it what to do.
  • Not all voice clones are convincing.

Be smart: The report recommends anyone who finds themselves in a suspicious phone call to paraphrase and repeat what was said to verify the accuracy.

  • Doing this will trip up the chatbots, which still struggle to understand basic conversational cues.
Go deeper