DeepMind's latest AI breakthrough is its most significant yet

Google-owned DeepMind's Go-playing artificial intelligence can now learn without human help... or data
iStock / vladimyr

DeepMind's human-conquering AlphaGo AI just got even smarter. The firm's latest Go-playing system not only defeated all previous versions of the software, it did it all by itself.

"The most striking thing for me is we don't need any human data anymore," says Demis Hassabis, the CEO and co-founder of DeepMind. While the first version of AlphaGo needed to be trained on data from more than 100,000 human games, AlphaGo Zero can learn to play from a blank slate. Not only has DeepMind removed the need for the initial human data input, Zero is also able to learn faster than its predecessor.

David Silver, the main programmer on DeepMind's Go project, says the original AlphaGo that defeated 18-time world champion Lee Sedol 4-1 required several months of training.

"We reached a superior level of performance after training for just 72 hours with AlphaGo Zero," he says. Only 4.9 million simulated games were needed to train Zero, compared to the original AlphaGo's 30 million. After the three days of learning Zero was able to defeat the Lee Sedol-conquering version 100-0. After it had been playing the game for 40 days, Zero defeated DeepMind's previous strongest version of AlphaGo, called Master, which defeated Chinese master Ke Jie in May.

The new DeepMind research has been published in the journal Nature and is another significant step towards the company's goal of creating general artificial intelligence.

Responding to the announcement in a separate editorial for Nature, Satinder Singh, the director of the University of Michigan's AI lab, said Zero "massively outperforms the already superhuman AlphaGo" and could be one of the biggest AI advances so far.

Besting AlphaGo

When AlphaGo Zero started playing Go against itself, it was only presented with a set of rules, a board and the white and black counters. It didn't have knowledge of what strategies, moves, or tactics would be required to win. "The only inputs it takes are the black and white stones of the board," Silver says, adding that he believes the company could make a system that's able to learn the rules of the game as well.

From the starting point of giving Zero the rules the system then plays games against itself. During this time it learns the moves it can make that will lead to a victory. For DeepMind to improve upon its already successful system and achieve this, it had to redesign the algorithms used within the AI.

The overall process uses a reinforcement learning algorithm that's combined with a search system. In its simplest form, this means that Zero learns from trial and error and can use its search system to scope out each potential move.

Read more: DeepMind's new AI ethics unit is the company's next big move

When Zero played a game against itself, it was given feedback from the system. A +1 is given if it wins and a -1 if it loses. After each game the neural network behind Zero automatically reconfigures to a new, theoretically better, version. On average the system took 0.4 seconds of thinking time before making a move.

"In the original version, we tried this a couple of years ago and it would collapse," Hassabis says. He cites DeepMind's "novel" reinforcement algorithms for Zero's new ability to learn without prior knowledge. Additionally the new system only uses one neural network instead of two and four of Google's AI processors compared to the 48 needed to beat Lee. During the development of Zero, Hassabis says the system was trained on hardware that cost the company as much as $35 million. The hardware is also used for other DeepMind projects.

In the development of Zero, DeepMind has been able to do more with less. In its internal testing, detailed in the Nature paper, the firm says Zero was able to beat all of its previous versions: AlphaGo Master, AlphaGo Lee, AlphaGo Fan, Crazy Stone, Pachi and GruGo. Silver adds that Zero didn't reach a maximum level of knowledge possible – but only because the team stopped working on the project.

"It is possible to train to superhuman level, without human examples or guidance, given no knowledge of the domain beyond basic rules," the research paper concludes. The system learned common human moves and tactics and supplemented them with its own, more efficient moves. "It found these human moves, it tried them and then ultimately it found something it prefers," Silver says. "Hopefully, humans are going to start looking at this and incorporating that into their own play."

In the real-world

As with Deep Blue's victory against chess grandmaster Gary Kasparaov in 1996, DeepMind's continued success at Go have wider implications.

But, as advanced as Zero is, it cannot be applied to any problem and solve it. "Taken together, the results suggest that AIs based on reinforcement learning can perform much better than those that rely on human expertise," Singh says. The system, for instance, couldn't be used to translate languages.

For Hassabis and colleagues, the ongoing challenge is applying what has been learned through the AlphaGo project to other AI problems with real-world applications. "We tried to devise the algorithm so that it could play, in principle, other games that are in a similar class (that would include Chess) and more generally planning domains," Silver says.

Read more: AI has no place in the NHS if patient privacy isn’t assured

This includes looking at protein folding, drug discovery, material design, and quantum chemistry. Part of solving these problems lies in the ability to be create simulations of potential outcomes. The game of Go is constrained to a fixed and strict environment: there's no randomness, luck, or chance affecting the outcome. Applying this approach to real-world scenarios where there's a level of unpredictability is much harder.

"I hope these kinds of these algorithms and future version of AlphaGo inspired things will be routinely working with us, as scientific and medical experts on advancing their frontiers," Hassabis says. "Maybe we will have some solutions to new types of batteries, materials, new drugs that will have been partly designed through these kinds of algorithms and working in tandem with humans".

This article was originally published by WIRED UK