Meta's move on AI bias raises risk, eyebrows
Add Axios as your preferred source to
see more of our stories on Google.
/2025/04/18/1745005067912.gif?w=3840)
Illustration: Sarah Grillo/Axios
Meta says it wants to remove bias from its models, but that's harder — and more dangerous — than it looks.
Why it matters: Meta's anti-bias push appears to be an effort to cater to the right's war on "woke" AI and less about model neutrality, according to experts.
Driving the news: When it released Llama 4 earlier this month, Meta included a note claiming that "it's well-known that all leading LLMs have had issues with bias — specifically, they historically have leaned left when it comes to debated political and social topics."
- "Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue," Meta said.
- As we predicted a year ago, there is indeed a fight brewing — both in the U.S. and globally — over just whose values AI systems will reflect.
Yes, but: Neither the problem nor the solutions are as clear as Meta's blog suggests.
- Llama already gave the most right-wing authoritarian answers to prompts (ChatGPT gave the most left-leaning answers), according to research from the University of Washington, Carnegie Mellon University, and Xi'an Jiaotong University in 2023.
- And there are all kinds of biases in large language models, far beyond an issue of right vs. left.
How it works: With billions of parameters, getting a large model to answer in a particular way isn't easy. But there are a few ways to put a thumb on the scale, Hugging Face head of community and collaborations Vaibhav Srivastav tells Axios.
- Before a model is trained, you can make calls on what data is included and excluded and also how the various sources are weighted.
- In the post-training phase, also known as fine tuning, model creators can use different techniques to guide a model. One method, known as reinforcement learning from human feedback, works by telling a model which kinds of answers are preferred.
- Another method is to provide additional system-level prompts that change the way an answer appears. This is a blunt tool that risks unintended consequences, like when Meta and Google tried to counteract bias and ended up generating implausible and historically inaccurate images — like Black founding fathers and diverse Nazi soldiers.
- "Besides anecdotal evidence, little public knowledge exists about what goes into post-training these models," Srivastav said. "However, when it comes to system prompts, there are ways to jailbreak the prompt and look at what the model creator/API provider wants to have."
Between the lines: Meta's move set off alarm bells for researchers and human rights groups, who worry that Meta appears to be steering Llama to the right.
- "It's a pretty blatant ideological play to effectively make overtures to the Trump administration," said Alex Hanna, director of research at the Distributed AI Research Institute.
Further, Meta and Grok have positioned themselves as models that will answer questions others refuse, a stance that worries some AI experts.
- Meta has not only stated this as a goal, but also indicated that it has made progress with Llama 4, which it says "refuses less on debated political and social topics overall (from 7% in Llama 3.3 to below 2%)."
- However, Allen Institute for AI senior researcher Jesse Dodge says this isn't such a good thing.
- "Refusals are an important part of building a model and having a model that's usable to lots of people," he said. "I don't know why they would advertise that it refuses a lot less."
The intrigue: While Meta and Grok accuse other AI models of having a left-leaning bias, experts say that the reality is more complex.
- Much of the bias stems from the training data. While most of the major models don't disclose their specific data sets, it's generally understood at this point that the major models have scraped most of the internet that isn't behind a paywall.
- Because much of that training data is in English, specifically the American dialect, AI models tend to be biased toward the perspectives captured in that language.
Zoom in: GLAAD, the LGBTQ+ rights organization, told Axios that it has already noticed Llama 4 making reference to discredited conversion therapy practices in some queries.
- "Both-sidesism that equates anti-LGBTQ junk-science with well-established facts and research is not only misleading — it legitimizes harmful falsehoods," a GLAAD spokesperson told Axios.
- "All major medical, psychiatric, and psychological organizations have condemned so-called 'conversion therapy,' and the United Nations has compared it to 'torture.'"
Editor's note: This story has been corrected to reflect that Jesse Dodge is a senior researcher at the Allen Institute for AI (not the Allen Institute, which is a separate organization).
