
Illustration: Aïda Amer/Axios
Open-source AI models, which let anyone view and manipulate the code, are growing in popularity as startups and giants alike race to compete with market leader ChatGPT.
Why it matters: The White House and some experts fear open-source models could aid risky uses — especially synthetic biology, which could create the next pandemic. But those sounding the alarms may be too late.
What's happening: The wide-open code helps little guys take on tech giants. But it also could help dictators and terrorists,.
- The startups are joined by Meta, which hopes to undercut Microsoft (OpenAI’s key backer), and by foreign governments looking to out-innovate the U.S.
- Top government officials are freaked out by the national security implications of having large open-source AI models in the hands of anyone who can code.
In the closed corner are early movers in generative AI — including OpenAI and Google, which are seeking to protect their early-mover advantage.
- OpenAI, despite its name, uses a closed model for ChatGPT — meaning it's kept full control and ownership.
The big picture: Building high quality AI models has become much cheaper since the open release of Meta's 65 billion-parameter LLaMA foundation model.
- Mosaic released a new open source model Thursday — MPT-30B — which it says outperforms the original GPT-3.
- Hugging Face founder Clement Delangue testified before Congress last week that open source models “prevent black-box systems” and “make companies more accountable" while fostering innovation across the economy.
State of play: There are now at least 37 open source LLMs, including smaller models that work nearly as well as the biggest models.
- Falcon, the top-ranked open-source model, was released on May 31 by the United Arab Emirates’ Technology Innovation Institute and now outperforms LLaMA, which it was built from.
- The Beijing Academy of AI released the mulitlingual Aquila on June 9.
- Open source software products have migrated from the academic research world to corporate centers over the past 25 years. Red Hat sold to IBM for $34 billion in 2019, and Silicon Valley’s biggest companies have each enabled hundreds of open source projects.
- Together, a startup developing open source generative AI, raised $20 million in mid-May.
Be smart: Open source code is, by definition, global, and once code is out "in the wild" it's almost impossible to corral or lock up.
- While open source AI can be misused in myriad ways, any effort to squash it would likely fail: adversarial governments would not cooperate, and small U.S. companies would pay a price in hobbled innovation.
Between the lines: Advocates of both open and closed systems claim to be democratizing access to AI, and most models blend elements of each approach.
- OpenAI admits it could not have built the closed ChatGPT system without access to open source products.
- Both open-source and proprietary AI models face complex legal questions over the presence of copyrighted material in the data pools used to train then.
What they’re saying: Sens. Richard Blumenthal (D-Conn.) and Josh Hawley (R-Mo.) wrote to Meta on June 6 suggesting the company did not conduct any meaningful assessment of how its LLaMA could be misused once released to the public, and asked for proof of efforts to mitigate the risks.
- Nick Clegg, Meta's global affairs president, said the company had released over 1,000 AI models, libraries and datasets for researchers — but the company also released Voicebox, an AI audio generation service, as a closed model on June 16 “because of the potential risks of misuse,” per a company statement.
- Chiraag Deora, an early-stage venture capitalist at CRV, told Axios he's bullish on open source AI, but warned of "trade-offs."
- Running an open source model "solves accessibility, fairness and security problems" because there's clear data oversight, Deora said. But he warned that managing open source code is more resource-intensive than using someone else's closed model.
- Established companies can be wary of closed models, Deora said: "A CTO of a large legacy organization said to me, 'We will absolutely not be using [Microsoft's] Copilot internally until we understand where the data comes from'," to avoid being "screwed" by future U.S. and current European privacy regulations.
The intrigue: The leaking of Meta’s LLaMA model has allowed the company to play a spoiler role against Microsoft-aligned Open AI and Google.
- Google did the same to Apple after the iPhone was released in 2007 when it made its Android mobile operating system open source. Android now dominates global smartphone market share (though Apple controls much of the market's most profitable high end).
Flashback: The open source movement is as old as software itself and sits at the heart of debates about intellectual property law and efforts to produce software more efficiently and effectively.
The bottom line: Open source models can provide important counterweights to closed systems, but need strong management and are best suited to low-risk use cases.