Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time.

Cone, Ian; Clopath, Claudia; Shouval, Harel Z

Cone, Ian; Clopath, Claudia; Shouval, Harel Z.

Afiliación

Cone I; Department of Bioengineering, Imperial College London, London, United Kingdom.
Clopath C; Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX.
Shouval HZ; Applied Physics Program, Rice University, Houston, TX.

Res Sq ; 2023 Sep 19.

Article en En | MEDLINE | ID: mdl-37790466

RESUMEN

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Res Sq Año: 2023 Tipo del documento: Article País de afiliación: Reino Unido Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google