Aug 11, 2023 - Technology

Thousands of hackers are throwing the world's cyber arsenal at AI models at DEF CON

Sam Sabin

Illustration of a hand holding a cursor with light and a sunburst behind it — Illustration: Sarah Grillo/Axios

Thousands of hackers are spending the weekend trying to break generative AI models like ChatGPT — all with the blessing of the White House and the tech companies behind these models.

Driving the news: The DEF CON hacking conference in Las Vegas is hosting a highly anticipated Generative Red Team Challenge throughout the weekend.

DEF CON is organized into a variety of "villages" that focus on different cybersecurity topics, such as aerospace, cloud security and critical infrastructure security.
The AI Village will host what's likely the largest public security test of large language models to date.

Why it matters: AI operators hope to avoid the mistakes of past innovators who moved quickly to roll out their technologies without fully considering the consequences or preparing them for adversarial users.

The big picture: The White House announced its support for the AI Village's test back in May and has been helping to design the challenge.

A who's who of major generative AI developers — including Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI and Stability AI — are also participating in the DEF CON challenge.
The AI Village organizers have previously piloted this challenge at South by Southwest and at a recent Howard University event.

How it works: Organizers are expecting roughly 3,500 participants in the challenge, and each one will get 50 minutes on one of the event's 156 closed-network computer terminals.

The challenge categories fall into five buckets: prompt hacking, security, information integrity, internal consistency and societal harms.
Participants will receive a list of challenges to try and will be randomly assigned a large language model to test out.
Organizers will also provide participants with a sheet of known hacking prompts and a locally hosted copy of Wikipedia so they can fact-check any misinformation the models spit out.

Between the lines: Much of the competition focuses on what the organizers are calling "embedded harms" that highlight the flaws that naturally occur in the models, rather than tricking the models into doing bad things.

What they're saying: "One of the challenges with this technology being so expensive to produce at the frontier is that it means that unfortunately, a lot of the knowledge and experience with these models is locked up within a small number of well-funded private companies," Michael Sellitto, head of geopolitics and security at Anthropic, told Axios ahead of the challenge.

"The organizers for the challenge are bringing in a really diverse group of people that are not the kind of normal people who work on the technology," he added.

Catch up quick: 2023 is far from the first year that AI has been a focal point of DEF CON — the AI Village started back in 2018.

But this year's interest in the village and the red team challenge has "just exploded," Sven Cattell, founder of the challenge, told reporters earlier this week.

Yes, but: The challenge organizers don't plan to release the results from the weekend right away to ensure they don't release any private data or unpatched vulnerabilities into the wild.

However, approved researchers are expected to be able to access the data for their own projects once the organizers security-proof the results.

Sign up for Axios' cybersecurity newsletter Codebook here

Add Axios on Google

Thousands of hackers are throwing the world's cyber arsenal at AI models at DEF CON

What to read next