One AI isn't enough anymore
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Allie Carl/Axios
Microsoft said Monday it has revamped one of its AI research tools to use models from both OpenAI and Anthropic, the clearest sign yet that the future of AI may be multi-model.
Why it matters: AI companies are increasingly pairing models together — having them cross-check, evaluate or specialize — in a bid to boost accuracy and reduce errors that any one model might miss.
Driving the news: The software giant is taking advantage of multiple models within its Microsoft 365 Copilot Researcher.
- A new "Critique" layer uses Anthropic's Claude to review answers generated by OpenAI's model to improve accuracy before a user sees the response.
- The company says that approach enabled the research agent to score 13.8% higher on the DRACO benchmark, an industry standard for deep research quality.
- Another new option, called Model Council, allows users to see a side-by-side comparison of responses from different models.
What they're saying: "It's becoming very clear to us that there will be many models," Microsoft executive VP Charles Lamanna told Axios. "Come summertime there will be many more models than just these two inside of Copilot."
The big picture: AI companies are experimenting with several different ways to use multiple models to complete tasks.
- When you prompt ChatGPT, Copilot or other models, they will often use a smaller classifier model to route you to the model most appropriate for the task.
- Perplexity has long allowed its users to choose from multiple models and see responses side-by-side.
- Anthropic uses a self-critique step mid-generation to catch errors before surfacing a final response from Claude.
Between the lines: The multi-model system has an added benefit for Microsoft, which is looking to show it isn't overly reliant on OpenAI.
- With the leading frontier labs frequently leapfrogging one another, Lamanna said businesses are interested in AI tools that can easily change which models are running under the hood.
Yes, but: Using multiple models on a single query can lead to increased costs and slower response times.
- Microsoft's Model Council, for example, costs roughly 2.5 times as much as using a single model, while the Critique approach costs about 20% more.
- That cost isn't directly passed on given Copilot is a subscription service, but it does inform where Microsoft decides to use multiple models versus relying on a single algorithm.
What we're watching: Microsoft is also building more homegrown models, and Lamanna said those models might show up first working in conjunction with outside models rather than as a full replacement.
- "It'll be in one of these ensemble experiences," he said.
