Aug 23, 2024 - Technology

Government-backed program recruiting a team to stress-test AI

Sam Sabin

Illustration of a test sheet with bubbles filled with letters A and I inside — Illustration: Natalie Peeples/Axios

A new government-supported program is actively recruiting people who want to help evaluate AI-enabled office productivity software.

Why it matters: Stress-testing generative AI tools is still one of the best ways to root out biases or false information in responses, as well as privacy concerns like responses that leak sensitive information.

Zoom in: Humane Intelligence, an AI ethics nonprofit, is teaming up with the National Institute of Standards and Technology to launch a new competition to find flaws in AI office productivity tools.

The first round of the competition is virtual, and anyone in the U.S. who is interested in red-teaming AI models is welcome to participate.
In this round, participants will run through a set of test scenarios and try to identify as many "violative outcomes" as possible.
Those who pass this round will be invited to participate in an in-person red-teaming exercise in late October at the Conference on Applied Machine Learning in Information Security in Arlington, Virginia.

The intrigue: Humane Intelligence is also inviting genAI model operators to donate access to their models for testing.

Qualifying products must be used for workplace productivity — such as coding, process automation and similar activities — and the operators must be willing to have their models tested for a variety of issues.

Between the lines: This new competition is just one of the challenges that Humane Intelligence is launching with government partners and NGOs in the coming weeks, the nonprofit told Wired.

Humane Intelligence sees these competitions as a way of engaging people with a variety of backgrounds, not just coders, in AI red-teaming.

What's next: Those interested in participating in the first round have until Sept. 9 to sign up.

Add Axios on Google

Government-backed program recruiting a team to stress-test AI

What to read next