AlphaZero was trained by simply playing against itself multiple times, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks. Training took several days, totaling about 41 TPU-years. It cost 3e22 FLOPs.
In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, Elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero's performance exceeded the benchmark after around four hours of training for Stockfish, two hours for Elmo, and eight hours for AlphaGo Zero.
AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against Elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 search), AlphaZero won 90 times, lost 8 times and drew twice. As in the chess games, each program got one minute per move, and Elmo was given 64 threads and a hash size of 1 GB.
After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40.
DeepMind stated in its preprint, "The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm – originally devised for the game of go – that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules." DeepMind's Demis Hassabis, a chess player himself, called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension."
Similarly, some shogi observers argued that the Elmo hash size was too low, that the resignation settings and the "EnteringKingRule" settings (cf. shogi § Entering King) may have been inappropriate, and that Elmo is already obsolete compared with newer programs.
Papers headlined that the chess training took only four hours: "It was managed in little more than the time between breakfast and lunch." Wired described AlphaZero as "the first multi-skilled AI board-game champ". AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a strong position against challengers. "It's not only about hiring the best programmers. It's also very political, as it helps make Google as strong as possible when negotiating with governments and regulators looking at the AI sector."
Human chess grandmasters generally expressed excitement about AlphaZero. Danish grandmaster Peter Heine Nielsen likened AlphaZero's play to that of a superior alien species. Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional understanding. Former champion Garry Kasparov said, "It's a remarkable achievement, even if we should have expected it after AlphaGo."
Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would probably not make the semifinals of a fair competition such as TCEC where all engines play on equal hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played drawish openings such as the Petroff Defence, AlphaZero would not be able to beat him in a correspondence chess game either.
Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat Elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100–200 higher than Elmo. This gap is not that high, and Elmo and other shogi software should be able to catch up in 1–2 years.
DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018 in Science. They further clarified that AlphaZero was not running on a supercomputer; it was trained using 5,000 tensor processing units (TPUs), but only ran on four TPUs and a 44-core CPU in its matches.
Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2 TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32 GB hash size. AlphaZero won 98.2% of games when playing sente (i.e. having the first move) and 91.2% overall.
Human grandmasters were generally impressed with AlphaZero's games against Stockfish. Former world champion Garry Kasparov said it was a pleasure to watch AlphaZero play, especially since its style was open and dynamic like his own.
"Silver" (PDF). idi.ntnu.no. https://www.idi.ntnu.no/emner/it3105/materials/neural/silver-2017b.pdf
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
Knapton, Sarah; Watson, Leon (December 6, 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". Telegraph.co.uk. Retrieved December 6, 2017. https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/
Vincent, James (December 6, 2017). "DeepMind's AI became a superhuman chess player in a few hours, just for fun". The Verge. Retrieved December 6, 2017. https://www.theverge.com/2017/12/6/16741106/deepmind-ai-chess-alphazero-shogi-go
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 7, 2018). "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play". Science. 362 (6419): 1140–1144. Bibcode:2018Sci...362.1140S. doi:10.1126/science.aar6404. PMID 30523106. /wiki/David_Silver_(computer_scientist)
"Chess Terms: AlphaZero". Chess.com. Retrieved July 30, 2022. https://chess.com/terms/alphazero-chess-engine
Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy (2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. arXiv:1911.08265. Bibcode:2020Natur.588..604S. doi:10.1038/s41586-020-03051-4. PMID 33361790. S2CID 208158225. /wiki/ArXiv_(identifier)
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
"Data on Notable AI Models". Epoch AI. June 19, 2024. Retrieved November 29, 2024. https://epoch.ai/data/notable-ai-models?view=table
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
"AlphaZero vs. Stockfish 2017". https://chess24.com/en/embed-custom-tournament/condensed/alphazero-vs-stockfish
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
"AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. December 8, 2017. Retrieved December 9, 2017. https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author
Stockfish developer Tord Romstad responded with The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions.[10]
"'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017. https://www.bbc.com/news/technology-42251535
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
"'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017. https://www.bbc.com/news/technology-42251535
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
"'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017. https://www.bbc.com/news/technology-42251535
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(computer_scientist)
Knight, Will (December 8, 2017). "Alpha Zero's "Alien" Chess Shows the Power, and the Peculiarity, of AI". MIT Technology Review. Retrieved December 11, 2017. https://www.technologyreview.com/s/609736/alpha-zeros-alien-chess-shows-the-power-and-the-peculiarity-of-ai/
"Google's AlphaZero Destroys Stockfish In 100-Game Match". Chess.com. Retrieved December 7, 2017. https://www.chess.com/news/view/google-s-alphazero-destroys-stockfish-in-100-game-match
"AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. December 8, 2017. Retrieved December 9, 2017. https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author
Katyanna Quach. "DeepMind's AlphaZero AI clobbered rival chess app on non-level playing...board". The Register (December 14, 2017). https://www.theregister.co.uk/2017/12/14/deepmind_alphazero_ai_unfair
"Some concerns on the matching conditions between AlphaZero and Shogi engine". コンピュータ将棋 レーティング. "uuunuuun" (a blogger who rates free shogi engines). Retrieved December 9, 2017. (via "瀧澤 誠@elmo (@mktakizawa) | Twitter". mktakizawa (elmo developer). December 9, 2017. Retrieved December 11, 2017.) http://www.uuunuuun.com/single-post/2017/12/07/Some-concerns-on-the-matching-conditions-between-AlphaZero-and-Shogi-engine
"DeepMind社がやねうら王に注目し始めたようです". The developer of YaneuraOu, a search component used by elmo. December 7, 2017. Retrieved December 9, 2017. http://yaneuraou.yaneu.com/2017/12/07/deepmind%E7%A4%BE%E3%81%8C%E3%82%84%E3%81%AD%E3%81%86%E3%82%89%E7%8E%8B%E3%81%AB%E6%B3%A8%E7%9B%AE%E3%81%97%E5%A7%8B%E3%82%81%E3%81%9F%E3%82%88%E3%81%86%E3%81%A7%E3%81%99/
Knapton, Sarah; Watson, Leon (December 6, 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". Telegraph.co.uk. Retrieved December 6, 2017. https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/
Badshah, Nadeem (December 7, 2017). "Google's DeepMind robot becomes world-beating chess grandmaster in four hours". The Times of London. Retrieved December 7, 2017. https://www.thetimes.com/business-money/technology/article/google-s-deepmind-alphazero-becomes-world-beating-chess-grandmaster-in-four-hours-hcppp9vr2
"Alphabet's Latest AI Show Pony Has More Than One Trick". WIRED. December 6, 2017. Retrieved December 7, 2017. https://www.wired.com/story/alphabets-latest-ai-show-pony-has-more-than-one-trick/
"'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017. https://www.bbc.com/news/technology-42251535
"'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017. https://www.bbc.com/news/technology-42251535
Knapton, Sarah; Watson, Leon (December 6, 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". Telegraph.co.uk. Retrieved December 6, 2017. https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/
"Google's AlphaZero Destroys Stockfish In 100-Game Match". Chess.com. Retrieved December 7, 2017. https://www.chess.com/news/view/google-s-alphazero-destroys-stockfish-in-100-game-match
Gibbs, Samuel (December 7, 2017). "AlphaZero AI beats champion chess program after teaching itself in four hours". The Guardian. Retrieved December 8, 2017. https://www.theguardian.com/technology/2017/dec/07/alphazero-google-deepmind-ai-beats-champion-program-teaching-itself-to-play-four-hours
"AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. December 8, 2017. Retrieved December 9, 2017. https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author
"Talking modern correspondence chess". Chessbase. June 26, 2018. Retrieved July 11, 2018. https://en.chessbase.com/post/correspondence-chess-and-correspondence-database-2018
DeepMind社がやねうら王に注目し始めたようです | やねうら王 公式サイト, 2017年12月7日 http://yaneuraou.yaneu.com/2017/12/07/deepmind%E7%A4%BE%E3%81%8C%E3%82%84%E3%81%AD%E3%81%86%E3%82%89%E7%8E%8B%E3%81%AB%E6%B3%A8%E7%9B%AE%E3%81%97%E5%A7%8B%E3%82%81%E3%81%9F%E3%82%88%E3%81%86%E3%81%A7%E3%81%99/
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 7, 2018). "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play". Science. 362 (6419): 1140–1144. Bibcode:2018Sci...362.1140S. doi:10.1126/science.aar6404. PMID 30523106. /wiki/David_Silver_(computer_scientist)
As given in the Science paper, a TPU is "roughly similar in inference speed to a Titan V GPU, although the architectures are not directly comparable" (Ref. 24).
"AlphaZero Crushes Stockfish In New 1,000-Game Match". December 6, 2018. https://www.chess.com/news/view/updated-alphazero-crushes-stockfish-in-new-1-000-game-match
"AlphaZero Crushes Stockfish In New 1,000-Game Match". December 6, 2018. https://www.chess.com/news/view/updated-alphazero-crushes-stockfish-in-new-1-000-game-match
Sean Ingle (December 11, 2018). "'Creative' AlphaZero leads way for chess computers and, maybe, science". The Guardian. https://www.theguardian.com/sport/2018/dec/11/creative-alphazero-leads-way-chess-computers-science
Albert Silver (December 7, 2018). "Inside the (deep) mind of AlphaZero". Chessbase. https://en.chessbase.com/post/the-full-alphazero-paper-is-published-at-long-last
"Komodo MCTS (Monte Carlo Tree Search) is the new star of TCEC". Chessdom. December 18, 2018. http://www.chessdom.com/komodo-mcts-monte-carlo-tree-search-is-the-new-star-of-tcec/
See TCEC and Leela Chess Zero. /wiki/TCEC
"Could Artificial Intelligence Save Us From Itself?". Fortune. 2019. Retrieved February 29, 2020. https://fortune.com/2019/11/26/ai-is-the-problem-and-the-solution/
"DeepMind's MuZero teaches itself how to win at Atari, chess, shogi, and Go". VentureBeat. November 20, 2019. Retrieved February 29, 2020. https://venturebeat.com/2019/11/20/deepminds-muzero-teaches-itself-how-to-win-at-atari-chess-shogi-and-go/