Search | VHL Regional Portal

Mastering Atari, Go, chess and shogi by planning with a learned model.

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David.

Nature ; 588(7839): 604-609, 2020 12.

Article in English | MEDLINE | ID: mdl-33361790

ABSTRACT

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis.

Science ; 362(6419): 1140-1144, 2018 12 07.

Article in English | MEDLINE | ID: mdl-30523106

ABSTRACT

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Subject(s)

Artificial Intelligence , Reinforcement, Psychology , Video Games , Algorithms , Humans , Software

Mastering the game of Go without human knowledge.

Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George; Graepel, Thore; Hassabis, Demis.

Nature ; 550(7676): 354-359, 2017 10 18.

Article in English | MEDLINE | ID: mdl-29052630

ABSTRACT

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.

Subject(s)

Games, Recreational , Software , Unsupervised Machine Learning , Humans , Neural Networks, Computer , Reinforcement, Psychology , Supervised Machine Learning

Mastering the game of Go with deep neural networks and tree search.

Silver, David; Huang, Aja; Maddison, Chris J; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis.

Nature ; 529(7587): 484-9, 2016 Jan 28.

Article in English | MEDLINE | ID: mdl-26819042

ABSTRACT

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Subject(s)

Games, Recreational , Neural Networks, Computer , Software , Supervised Machine Learning , Computers , Europe , Humans , Monte Carlo Method , Reinforcement, Psychology

Adaptive control of epileptiform excitability in an in vitro model of limbic seizures.

Panuccio, Gabriella; Guez, Arthur; Vincent, Robert; Avoli, Massimo; Pineau, Joelle.

Exp Neurol ; 241: 179-83, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23313899

ABSTRACT

Deep brain stimulation (DBS) is a promising tool for treating drug-resistant epileptic patients. Currently, the most common approach is fixed-frequency stimulation (periodic pacing) by means of stimulating devices that operate under open-loop control. However, a drawback of this DBS strategy is the impossibility of tailoring a personalized treatment, which also limits the optimization of the stimulating apparatus. Here, we propose a novel DBS methodology based on a closed-loop control strategy, developed by exploiting statistical machine learning techniques, in which stimulation parameters are adapted to the current neural activity thus allowing for seizure suppression that is fine-tuned on the individual scale (adaptive stimulation). By means of field potential recording from adult rat hippocampus-entorhinal cortex (EC) slices treated with the convulsant drug 4-aminopyridine we determined the effectiveness of this approach compared to low-frequency periodic pacing, and found that the closed-loop stimulation strategy: (i) has similar efficacy as low-frequency periodic pacing in suppressing ictal-like events but (ii) is more efficient than periodic pacing in that it requires less electrical pulses. We also provide evidence that the closed-loop stimulation strategy can alternatively be employed to tune the frequency of a periodic pacing strategy. Our findings indicate that the adaptive stimulation strategy may represent a novel, promising approach to DBS for individually-tailored epilepsy treatment.

Subject(s)

Adaptation, Physiological/physiology , Evoked Potentials/physiology , Limbic System/physiology , Animals , Biophysics , Electric Stimulation/adverse effects , In Vitro Techniques , Neural Pathways/physiology , Rats , Rats, Sprague-Dawley

Treating epilepsy via adaptive neurostimulation: a reinforcement learning approach.

Pineau, Joelle; Guez, Arthur; Vincent, Robert; Panuccio, Gabriella; Avoli, Massimo.

Int J Neural Syst ; 19(4): 227-40, 2009 Aug.

Article in English | MEDLINE | ID: mdl-19731397

ABSTRACT

This paper presents a new methodology for automatically learning an optimal neurostimulation strategy for the treatment of epilepsy. The technical challenge is to automatically modulate neurostimulation parameters, as a function of the observed EEG signal, so as to minimize the frequency and duration of seizures. The methodology leverages recent techniques from the machine learning literature, in particular the reinforcement learning paradigm, to formalize this optimization problem. We present an algorithm which is able to automatically learn an adaptive neurostimulation strategy directly from labeled training data acquired from animal brain tissues. Our results suggest that this methodology can be used to automatically find a stimulation strategy which effectively reduces the incidence of seizures, while also minimizing the amount of stimulation applied. This work highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders such as epilepsy.

Subject(s)

Electric Stimulation Therapy/methods , Epilepsy/therapy , Learning/physiology , Reinforcement, Psychology , 4-Aminopyridine/pharmacology , Algorithms , Animals , Biophysics , Disease Models, Animal , Electroencephalography/methods , Entorhinal Cortex/physiopathology , Epilepsy/chemically induced , Epilepsy/pathology , In Vitro Techniques , Male , Man-Machine Systems , Potassium Channel Blockers/pharmacology , Rats , Rats, Sprague-Dawley

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL