Feb 15, 2024 - Technology

"Open" software needs an AI rethink

Illustration of a dictionary with binary code and index notches in the shape of a question mark

Illustration: Sarah Grillo/Axios

While nearly everyone in the AI world claims to be "open" in some way, the software industry's current method of making open products doesn't fit the way AI is actually built.

Why it matters: Open approaches could speed up innovation, as advocates believe, or magnify some risks, as critics fear — but the people and companies creating today's most advanced AI models don't even agree on what "open" AI means.

The big picture: Open source software generally means that anyone can read, copy and modify the code. It made today's internet possible, but its rules and licenses don't easily map to the AI world.

  • The licenses that govern open source applications and operating systems require free access to the program code. That can save money, speed research and future-proof systems in cases where vendors go out of business.

But today's AI is more than programming instructions — it takes the form of mathematical models trained on data. To understand, reproduce and change how they work, you need access to all that. That's where things get messy today.

  • Despite its name, OpenAI's ChatGPT is not currently an open-source model, although OpenAI has published some open-source models — including GPT-2, Point-E, Whisper, Jukebox, and CLIP.
  • Meta calls both of its Llama releases open source because they're freely available for commercial and non-commercial use. With Llama 2, the company released model weights, but did not release the training data.
  • Earlier this month the Allen Institute announced an open source model, but went a step further, releasing not only the model and its weights, but also its full training data and pre-training code.

Flashback: The open source movement that began in the'90s, and the free software movement before it, sought to create legal alternatives to copyrighting program code and offer protections for those who wanted to share their work.

  • But today's AI industry lacks any sort of consensus on what "open source" means in the new field.
  • "There is definitely not a universally accepted definition," Nikolaos Vasiloglou, VP of research ML at RelationalAI, tells Axios.

What they're saying: Sharing an AI model or product is different from sharing a program.

  • Moez Draief, managing director of Mozilla.ai, says that issues around privacy, bias and liability — among other factors — make opening an AI model incredibly complex.
  • When it comes to AI, there's a spectrum of openness — it's not a binary choice between open or closed, Draief tells Axios.
  • Hugging Face, a provider of tools for developing open source AI models, says it takes "a community-based approach to understanding how concepts like 'open source' and 'open' are defined and understood."
  • "We tend to use terminology like 'open science' and 'open access,'" Yacine Jernite, machine learning and society lead for Hugging Face, tells Axios.

Between the lines: Some LLM creators start with a closed model and a plan to release the source code later.

  • Jon Carvill, senior director of AI communications at Meta, told Axios that the company's FAIR research team starts every AI research project with the intention to openly release it. "We assess our strategy on openness on a case-by-case basis for our AI research at FAIR," Carvill says.
  • Zack Kass, who oversaw the launch of ChatGPT at OpenAI and has since left the company, says this "closed first, open later" approach can help with "alignment " — meaning how well the AI model reflects the values its creators sought to embed in it.
  • Like children, Kass argues, AI models need to mature before they're open sourced. "There's a reason we don't tell kids to go out into the world until they're 18," Kass tells Axios.

Yes, but: Internet activist Cory Doctorow charges the industry with "openwashing" — embracing the language of openness without delivering the substance.

  • Doctorow argues that the large AI companies call themselves "open" in order to evade both regulation and criticism, "by casting themselves as forces of ethical capitalism, committed to the virtue of openness."

What's next: President Biden's AI executive order tasked the National Telecommunications and Information Administration with assessing the opportunities and risks of open source AI.

Go deeper