Dec 7, 2023 - Technology

Meta releases more security guidelines for AI models

Ryan Heath

Animated illustration of a llama eating ones and zeros. — Illustration: Lindsey Bailey/Axios

Meta released benchmark cybersecurity practices for large language models, which it says is an effort to "level the playing field for developers to responsibly deploy generative AI models."

Why it matters: The White House has urged AI companies to ramp up their safety efforts, and codify some safety requirements in its AI Executive Order, worried that AI chatbots and open source LLMs like Meta's Llama 2 will lead to dangerous misuse.

LLMs can serve as attack vectors, hacked to access proprietary information, or manipulated to produce harmful content, even when they've been designed not to.

The big picture: Cybersecurity risks around LLMs are "a pervasive problem that we need to mitigate," Joseph Spisak, Meta's director of product management for generative AI, tells Axios.

"There's no real ground truth: We're still trying to find our way into how to evaluate these models" and need to "build a community to help standardize these things," he says.

What's happening: Meta's two key releases in its "Purple Llama" initiative are CyberSec Eval, a set of cybersecurity safety evaluation benchmarks for LLMs; and Llama Guard, which "provides developers with a pre-trained model to help defend against generating potentially risky outputs."

The tool is intended to help developers make it harder for bad actors to manipulate LLMs to generate malicious code and to evaluate the frequency of insecure code suggestions.
Spisak told Axios that Purple Llama will partner with members of a newly-formed AI Alliance that Meta is helping lead and others such as Microsoft, AWS, Nvidia and Google Cloud.

The intrigue: Purple Llama is a reference to what you get when red teams (attacking teams) and blue teams (defense) are combined and added to Meta's open source foundational model.

Meta named Papers With Code, HELM, Together.AI and Anyscale as additional project partners.

Flashback: Meta previously released a Llama 2 Responsible Use Guide, an approach that critics say is insufficient for managing how an open source model can be misused in the wild.

Add Axios on Google

Meta releases more security guidelines for AI models

What to read next