Oct 31, 2023 - Technology

What Biden's plan to rely on ethical hacking for AI safety leaves out

Illustration of a web browser guarded by lasers

Illustration: Annelise Capossela/Axios

Policymakers are increasingly turning to ethical hackers to find flaws in artificial intelligence tools, but some security experts fear they're leaning too hard on these red-team hackers to address all AI safety and security problems.

Why it matters: Red teaming — where ethical hackers try breaking into a company or organization — is a major touchstone in the AI executive order President Joe Biden signed yesterday.

  • Among other things, the order calls on the National Institute of Standards and Technology to develop rules for red-team testing and calls on certain AI developers to submit all red-team safety results for government review before releasing a new model.
  • The executive order is likely to become a policy roadmap for regulators and lawmakers looking for ways to properly safety test new AI tools.

The big picture: Employing red teams at companies is a tried-and-trusted security method.

  • Many companies have internal teams that try to hack their employer to find and remediate any security vulnerabilities.
  • But in the context of AI, red teaming has taken on an expanded definition that includes any program that lets users test tools to see what prompts could inspire bad outputs.

Catch up quick: Policymakers and industry leaders have been rallying around red teaming as the go-to practice for securing and finding flaws in AI systems.

  • The White House and several tech companies backed a DEF CON red-teaming challenge in August where thousands of hackers tried to get AI chatbots to spit out things like credit card information and instructions on how to stalk someone.
  • The Frontier Model Forum — a relatively new industry group backed by Microsoft, OpenAI, Google and Anthropic — also released a paper this month laying out its own standard approach to AI red teaming.
  • Last week, OpenAI established a preparedness team that will use internal red teaming and capability assessments to mitigate so-called catastrophic risks.

What they're saying: "The emergence of AI red teaming, as a policy solution, has also had the effect of side-lining other necessary AI accountability mechanisms," Brian J. Chen, director of policy at Data & Society, told Axios.

  • "Those other mechanisms — things like algorithmic impact assessment, participatory governance — are critical to addressing the more complex, sociotechnical harms of AI," he added.

Between the lines: Red teams are helpful when a problem is clearly defined, such as telling teams to hunt for known vulnerabilities and users to trick AI chatbots into writing phishing emails.

  • But policymakers and tech companies have yet to determine what these models should be allowed to share and how they can be fairly used, which complicates the effectiveness of red teams, Data & Society senior researcher Ranjit Singh told Axios.
  • Red teams also aren't equipped to tackle the larger questions around algorithmic bias in AI tools or fair uses for AI models, he added.
  • Instead, companies and policymakers need to shift their attention to the algorithms and data sources at the heart of the models, rather than the outputs, David Brumley, a professor at Carnegie Mellon University and CEO of security testing company ForAllSecure, told Axios.

The intrigue: More funding is needed to expand AI security efforts beyond just internal and public red-teaming assessments, Singh added.

  • "We are spending too much time building these technologies and a lot less money on ensuring that they're accountable and safe," he said.
  • This scarcity has translated into "competing challenges" to figure out how best to assess security and safety, Singh said, and red teaming has the upper hand given how well established and easy it is for companies to focus on.

Yes, but: Biden's executive order does touch on broader safety and security problems — although questions about how it will implement those provisions remain.

Go deeper