Multipolar World

From Deep Blue to AlphaZero

Jorge Costa Oliveira

“Go” (in Chinese, wéiqí) is a two-player board game invented in China more than 2,500 years ago. The objective is to surround more territory than one’s opponent. There is extensive technical literature and study material on this strategy game, which has between 20–45 million players, most of them living in China, Japan, and Korea.

In 1997, IBM’s artificial intelligence (AI), “supercomputer” Deep Blue, managed to defeat world chess champion Garry Kasparov. But it was only when DeepMind developed the AI AlphaGo that victory in Go – the last bastion of human superiority in strategy games – became possible. The original AlphaGo used a combination of supervised learning and reinforcement learning techniques. It was trained on millions of human-played Go matches, then refined its skills by playing against itself. This approach allowed AlphaGo to develop innovative strategies that surprised even the greatest human experts.

A 2015 AlphaGo version was the first AI to defeat (5–0) the European Go champion Lee Hui. The following year, a stronger AlphaGo version beat (4–1) Lee Sedol, one of the world’s best players (18-time world champion and professional 9-dan, the highest rank). At the 2017 Future of Go Summit, the AlphaGo Master version defeated Ke Jie, then the world’s number one player, in a three-game match.

In 2017, DeepMind launched AlphaGo Zero. Unlike AlphaGo, AlphaGo Zero was not trained on previous human Go games. Instead, it learned to play Go from scratch, using only the rules of the game and playing millions of matches against itself, powered by prior machine learning knowledge through an artificial neural network. This approach allowed AlphaGo Zero to develop entirely new strategies and surpass AlphaGo after just 40 days of training. In DeepMind’s London lab environment, AlphaGo Zero overwhelmingly defeated (100–0) AlphaGo Master.

Also in 2017, AlphaZero – AlphaGo Zero’s successor – using a similar approach to AlphaGo Zero, became the world’s strongest player of both Go and chess. After 34 hours of self-learning Go, AlphaZero played against AlphaGo Zero, winning 60 games and losing 40. On chess the results were also remarkable. Danish grandmaster Peter Heine Nielsen likened AlphaZero’s play to that of a superior alien species. Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero’s play as “insane attacking chess” with profound positional understanding. Former champion Garry Kasparov said: “It’s a remarkable achievement, even if we should have expected it after AlphaGo”. In fact, AlphaZero’s strategy, its superhuman performance, and certain decisive moves (namely move 37 on the second game against Ke Jie) were described by human Go experts as “alien”.

The evolution from the original AlphaGo to AlphaZero shows that AI systems can learn and improve autonomously, without requiring human data to guide them. It also demonstrates that machine-learning systems are more powerful and effective when fully autonomous and free from the constraints of human data. And this was back in 2017…

linkedin.com/in/jorgecostaoliveira

Categories Multipolar World Opinion