DeepMind's latest iteration of AlphaGo — the artificial intelligence that beat world champion Go player Lee Sedol in 2016 — can learn to play the ancient game without feedback from humans or data on their past plays, researchers report today in Nature. Instead, the new AlphaGo Zero started with just knowledge of the rules and learned from the success of a million random moves it made against itself.
The score: After three days of training, the AI beat the original AlphaGo 100 to 0 — and was also able to create new moves in the process. This demonstrates a decades-old idea called reinforcement learning, suggesting that "AIs based on reinforcement learning can perform much better than those that rely on human expertise," writes computer scientist Satinder Singh in his accompanying article.
What it means: If AI can utilize reinforcement learning, that could be important in cases where large amounts of human expertise isn't available. But, it isn't clear how much this strategy will generalize to other applications and problems, says the University of Washington's Pedro Domingos. Go, though more complex than chess, offers a problem with defined rules unlike a busy street with unpredictable pedestrians and ambiguous shadows that a robot-controlled car might operate in.