Pesquisa | Portal Regional da BVS

Mastering the game of Stratego with model-free multiagent reinforcement learning.

Perolat, Julien; De Vylder, Bart; Hennes, Daniel; Tarassov, Eugene; Strub, Florian; de Boer, Vincent; Muller, Paul; Connor, Jerome T; Burch, Neil; Anthony, Thomas; McAleer, Stephen; Elie, Romuald; Cen, Sarah H; Wang, Zhe; Gruslys, Audrunas; Malysheva, Aleksandra; Khan, Mina; Ozair, Sherjil; Timbers, Finbarr; Pohlen, Toby; Eccles, Tom; Rowland, Mark; Lanctot, Marc; Lespiau, Jean-Baptiste; Piot, Bilal; Omidshafiei, Shayegan; Lockhart, Edward; Sifre, Laurent; Beauguerlange, Nathalie; Munos, Remi; Silver, David; Singh, Satinder; Hassabis, Demis; Tuyls, Karl.

Science ; 378(6623): 990-996, 2022 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-36454847

RESUMO

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.

Assuntos

Inteligência Artificial , Reforço Psicológico , Jogos de Vídeo , Humanos

Mastering Atari, Go, chess and shogi by planning with a learned model.

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David.

Nature ; 588(7839): 604-609, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-33361790

RESUMO

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Improved protein structure prediction using potentials from deep learning.

Senior, Andrew W; Evans, Richard; Jumper, John; Kirkpatrick, James; Sifre, Laurent; Green, Tim; Qin, Chongli; Zídek, Augustin; Nelson, Alexander W R; Bridgland, Alex; Penedones, Hugo; Petersen, Stig; Simonyan, Karen; Crossan, Steve; Kohli, Pushmeet; Jones, David T; Silver, David; Kavukcuoglu, Koray; Hassabis, Demis.

Nature ; 577(7792): 706-710, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31942072

RESUMO

Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.

Assuntos

Aprendizado Profundo , Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Caspases/química , Caspases/genética , Conjuntos de Dados como Assunto , Dobramento de Proteína , Proteínas/genética

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Vinyals, Oriol; Babuschkin, Igor; Czarnecki, Wojciech M; Mathieu, Michaël; Dudzik, Andrew; Chung, Junyoung; Choi, David H; Powell, Richard; Ewalds, Timo; Georgiev, Petko; Oh, Junhyuk; Horgan, Dan; Kroiss, Manuel; Danihelka, Ivo; Huang, Aja; Sifre, Laurent; Cai, Trevor; Agapiou, John P; Jaderberg, Max; Vezhnevets, Alexander S; Leblond, Rémi; Pohlen, Tobias; Dalibard, Valentin; Budden, David; Sulsky, Yury; Molloy, James; Paine, Tom L; Gulcehre, Caglar; Wang, Ziyu; Pfaff, Tobias; Wu, Yuhuai; Ring, Roman; Yogatama, Dani; Wünsch, Dario; McKinney, Katrina; Smith, Oliver; Schaul, Tom; Lillicrap, Timothy; Kavukcuoglu, Koray; Hassabis, Demis; Apps, Chris; Silver, David.

Nature ; 575(7782): 350-354, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31666705

RESUMO

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions1-3, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems4. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks5,6. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

Assuntos

Reforço Psicológico , Jogos de Vídeo , Inteligência Artificial , Humanos , Aprendizagem

Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).

Proteins ; 87(12): 1141-1148, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31602685

RESUMO

We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.

Assuntos

Biologia Computacional/métodos , Redes Neurais de Computação , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis.

Science ; 362(6419): 1140-1144, 2018 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-30523106

RESUMO

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Assuntos

Inteligência Artificial , Reforço Psicológico , Jogos de Vídeo , Algoritmos , Humanos , Software

Mastering the game of Go without human knowledge.

Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George; Graepel, Thore; Hassabis, Demis.

Nature ; 550(7676): 354-359, 2017 10 18.

Artigo em Inglês | MEDLINE | ID: mdl-29052630

RESUMO

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.

Assuntos

Jogos Recreativos , Software , Aprendizado de Máquina não Supervisionado , Humanos , Redes Neurais de Computação , Reforço Psicológico , Aprendizado de Máquina Supervisionado

Mastering the game of Go with deep neural networks and tree search.

Silver, David; Huang, Aja; Maddison, Chris J; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis.

Nature ; 529(7587): 484-9, 2016 Jan 28.

Artigo em Inglês | MEDLINE | ID: mdl-26819042

RESUMO

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Assuntos

Jogos Recreativos , Redes Neurais de Computação , Software , Aprendizado de Máquina Supervisionado , Computadores , Europa (Continente) , Humanos , Método de Monte Carlo , Reforço Psicológico

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA