Getting AI to pick your March Madness bracket is harder than it seems
Add Axios as your preferred source to
see more of our stories on Google.

Illustration: Lindsey Bailey/Axios
AI chatbots are not quite ready to pick your March Madness brackets.
Why it matters: Tech companies tout AI as semi-autonomous agents, but my testing shows that plenty of tasks are still beyond its capabilities.
- Human oversight remains critical. Many of the brackets returned during two days of testing looked plausible, until I tried entering them into ESPN's challenge and found impossible matchups.
The big picture: March Madness has long been a tech proving ground, from video streaming to picture-in-picture, plus companies are always competing to be your second screen while you watch the games.
- This year, Perplexity is incorporating odds from prediction market Kalshi, as Axios first reported.
Between the lines: OpenAI says ChatGPT is better at analyzing matchups or strategy than building full brackets. That holds true for others, too.
My thought bubble: I thought AI might struggle as much as humans to pick the correct winners, but I didn't anticipate how much trouble the chatbots had with the brackets themselves.
- Chatbots confidently offered picks that made no sense, reinforcing my belief that generative AI isn't ready to handle critical tasks on its own.
Zoom in: Here's how several chatbots approached the tasks and where and how things went awry.
OpenAI ChatGPT
- The first men's bracket returned by ChatGPT predicted the favorites winning each game all the way through the finals.
- I asked it to suggest some possible upsets. It flagged men's teams like Drake and UC San Diego and offered general tips like betting on teams that have been hot of late.
- When I asked it to produce a full bracket, though, it made some mistakes. For example, ChatGPT picked Marquette to beat New Mexico (perfectly reasonable) but also had San Diego upsetting Texas A&M. I'm all for picking upsets, but Texas A&M is playing Yale in the first round.
- After a bit of trying, I got ChatGPT to predict a full women's bracket. It included some bold predictions, including No. 11 seed Florida Gulf Coast University upsetting No. 3 seed Oklahoma, while choosing South Carolina to beat Southern California in a battle of the USCs for the national final.
Google Gemini
- My first try was with Gemini's deep research, but that returned various errors. Google suggested not using deep research, since that relies on an experimental version of its model.
- Next I used standard Gemini 2.0 Flash. That produced a plausible set of first-round matchups, but misunderstood the bracket and created incorrect second-round games.
- On the women's side, meanwhile, Gemini couldn't seem to get the matchups right despite several attempts.
Anthropic Claude
- Anthropic had to run the queries for Axios because it relies on a web search feature that is close to, but not yet, publicly available.
- Claude did the best of the bunch, returning a men's bracket that was plausible up until the Final Four, where it got the semifinal matchups wrong.
- I pointed out the mistake to Anthropic and when they relayed that information to Claude, the chatbot apologized. Its revised picks projected that Houston would defeat Duke and Michigan State would beat Florida, with Houston being Claude's pick to win it all.
- Claude's first women's bracket was complete but ultra-conservative, favoring top seeds across the board. But when asked to suggest more long shots, it happily offered up a bracket full of upsets, including picking the Ivy League's Columbia to win the whole thing.
Manus
- I decided to give the buzzy new Chinese AI agent an opportunity to make the tournament.
- Manus itself is struggling to get in playing shape and frequently returned over-capacity errors. It's still in early invite beta, so they get a pass on that.
- However, despite trying several times over two days to get a workable bracket, I was unable to get a proper set of matchups.
- Even still, it was fascinating to watch Manus in action. Though it didn't complete the task, it sure did its homework. Because you can watch the Manus computer in action, I saw it devour statistics, read all the experts and even scour for any last-minute injuries that might influence its predictions.
- Since Manus runs on a remote server, I could watch the Warriors and play sudoku at the same time.
The bottom line: AI agents excel at some tasks but frustrate in others. And they always need fact-checking.
