Two new AI systems beat humans at complex games

- Alison Snyder, author ofAxios Science

Illustration: Shoshana Gordon/Axios
Two new papers from AI powerhouses DeepMind and Meta describe how AI systems are notching wins against human players in complex games involving deception, negotiation and cooperation.
Why it matters: Machine contenders have struggled with games where information is incomplete or hidden from players — similar to the intentions of humans in daily life and interactions.
Driving the news: Researchers from DeepMind outline a new autonomous agent called "DeepNash" that learned to play the game Stratego in a paper published today in Science.
- Stratego is played between two people who each move 40 pieces with different ranks — that can't be seen by their opponent — with a goal of capturing the other player's flag.
- DeepNash couldn't play Stratego by searching all possible scenarios because there is an "astronomical" number, the DeepMind team writes — far more than in chess, Go and poker, which AI systems have defeated.
How they did it: The DeepMind team combined an algorithm for learning the game through self-play and another that steers that learning toward an optimal strategy.
- DeepNash learned the game from scratch by playing about 5.5 billion games against itself over four months.
- The AI agent beat other existing Stratego bots, which play at an amateur level, more than 97% of the time, they report. It won 84% of the time against human expert players on an online gaming platform, sometimes by bluffing and deceiving.
- DeepNash can "handle huge amounts of uncertainty in the form of imperfect information, more than has previously been possible," DeepMind research scientists Julien Perolat and Karl Tuyls, who are co-lead authors of the paper, said in an email.
Meta researchers last week described an AI system called "Cicero" that they report can play the game Diplomacy at the level of humans.
- In Diplomacy, up to seven players negotiate, deceive and build alliances to try to win control of territories on a map.
- "Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans," they wrote in Science.
- In 40 games of a blitz version of Diplomacy where the time for each move is limited to five minutes, Cicero scored more than double the average score of human players it went up against on a gaming platform.
Of note: Some experts say AI systems that can play these games raise concerns about machines having the ability to deceive.
- Cicero “passed as a human player in 40 games” with 82 unique players, the Meta researchers reported.
- They also said they were able to control the AI's dialogue to be "largely honest and helpful."
The big picture: Experts debate how much mastering games will help to develop intelligent machines that can navigate the world of humans.
- Some argue their rules are specific and winning on the board doesn't easily extend to a range of real-world problems.
- But others say some of the skills required to win games of strategy could lead to real-world applications.
The debate is "a misguided question," says Tuomas Sandholm, a professor at Carnegie Mellon University who has studied game theory for three decades.
- "When it comes to planning against adversaries, games are not only important they are necessary," he says, adding they can provide insights about negotiating and reasoning for business, finance and defense sectors, which the two companies he founded and runs, Strategy Robot and Strategic Machines, focus on.
Yes, but: A board game is a "highly controlled and limited setting," says Luca Weihs, a research scientist at the Allen Institute for AI who works on how systems can be physically embodied to control a robot or vehicle.
- But in the real world, an AI system might see a human do something it's never seen before — and have to reason the goal of the person through common sense.
- Hidden information and intentions abound in daily life, even in seemingly simple tasks like helping a partner load the dishwasher or driving a car through a street alongside other drivers.
- "We’re constantly working with missing information about how humans are operating and we very intuitively fill in the gaps."
People can also adapt to changes in the rules of the game or structure of the board, says Brenden Lake, who co-directs the Minds, Brains, and Machines Initiative at New York University.
- "Many top systems would be completely thrown by a change to the rules, or to the size and shape of the game board, if they are not given an opportunity to retrain."
What to watch: Cicero relies on a more classical AI approach that involves training it on a corpus of human games and other bespoke data, which gives it some innate knowledge, researcher Gary Marcus writes.
- That's in contrast to systems like DeepNash, which is entirely trained from scratch through self-play. Cicero also uses some self-play data.
- Marcus has long argued the deep learning approach, which powers chatbots, virtual assistants and autonomous driving abilities and has been the intense focus of researchers, is limited.
- It may "ultimately prove to be even more valuable if it is embedded in highly structured systems," he writes.