Search | VHL Regional Portal

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Vinyals, Oriol; Babuschkin, Igor; Czarnecki, Wojciech M; Mathieu, Michaël; Dudzik, Andrew; Chung, Junyoung; Choi, David H; Powell, Richard; Ewalds, Timo; Georgiev, Petko; Oh, Junhyuk; Horgan, Dan; Kroiss, Manuel; Danihelka, Ivo; Huang, Aja; Sifre, Laurent; Cai, Trevor; Agapiou, John P; Jaderberg, Max; Vezhnevets, Alexander S; Leblond, Rémi; Pohlen, Tobias; Dalibard, Valentin; Budden, David; Sulsky, Yury; Molloy, James; Paine, Tom L; Gulcehre, Caglar; Wang, Ziyu; Pfaff, Tobias; Wu, Yuhuai; Ring, Roman; Yogatama, Dani; Wünsch, Dario; McKinney, Katrina; Smith, Oliver; Schaul, Tom; Lillicrap, Timothy; Kavukcuoglu, Koray; Hassabis, Demis; Apps, Chris; Silver, David.

Nature ; 575(7782): 350-354, 2019 11.

Article in English | MEDLINE | ID: mdl-31666705

ABSTRACT

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions1-3, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems4. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks5,6. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

Subject(s)

Reinforcement, Psychology , Video Games , Artificial Intelligence , Humans , Learning

Gated Orthogonal Recurrent Units: On Learning to Forget.

Jing, Li; Gulcehre, Caglar; Peurifoy, John; Shen, Yichen; Tegmark, Max; Soljacic, Marin; Bengio, Yoshua.

Neural Comput ; 31(4): 765-783, 2019 04.

Article in English | MEDLINE | ID: mdl-30764742

ABSTRACT

We present a novel recurrent neural network (RNN)-based model that combines the remembering ability of unitary evolution RNNs with the ability of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.

Subject(s)

Neural Networks, Computer , Computer Simulation , Humans , Language , Learning , Logic , Memory

Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes.

Gulcehre, Caglar; Chandar, Sarath; Cho, Kyunghyun; Bengio, Yoshua.

Neural Comput ; 30(4): 857-884, 2018 04.

Article in English | MEDLINE | ID: mdl-29381440

ABSTRACT

We extend the neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing trainable address vectors. This addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing strategies, including both linear and nonlinear ones. We implement the D-NTM with both continuous and discrete read and write mechanisms. We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRU controller. We provide extensive analysis of our model and compare different variations of neural Turing machines on this task. We show that our model outperforms long short-term memory and NTM variants. We provide further experimental results on the sequential [Formula: see text]MNIST, Stanford Natural Language Inference, associative recall, and copy tasks.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL