Pesquisa | Portal Regional da BVS (teste)

STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.

Lee, Young Jae; Kim, Jaehoon; Kwak, Mingu; Park, Young Joon; Kim, Seoung Bum.

Neural Netw ; 160: 1-11, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-36587439

RESUMO

With the development of deep learning technology, deep reinforcement learning (DRL) has successfully built intelligent agents in sequential decision-making problems through interaction with image-based environments. However, learning from unlimited interaction is impractical and sample inefficient because training an agent requires many trial and error and numerous samples. One response to this problem is sample-efficient DRL, a research area that encourages learning effective state representations in limited interactions with image-based environments. Previous methods could effectively surpass human performance by training an RL agent using self-supervised learning and data augmentation to learn good state representations from a given interaction. However, most of the existing methods only consider similarity of image observations so that they are hard to capture semantic representations. To address these challenges, we propose spatio-temporal and action-based contrastive representation (STACoRe) learning for sample-efficient DRL. STACoRe performs two contrastive learning to learn proper state representations. One uses the agent's actions as pseudo labels, and the other uses spatio-temporal information. In particular, when performing the action-based contrastive learning, we propose a method that automatically selects data augmentation techniques suitable for each environment for stable model training. We train the model by simultaneously optimizing an action-based contrastive loss function and spatio-temporal contrastive loss functions in an end-to-end manner. This leads to improving sample efficiency for DRL. We use 26 benchmark games in Atari 2600 whose environment interaction is limited to only 100k steps. The experimental results confirm that our method is more sample efficient than existing methods. The code is available at https://github.com/dudwojae/STACoRe.

Assuntos

Benchmarking , Inteligência , Humanos , Reforço Psicológico , Semântica

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA