Search | VHL Regional Portal

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

Liakoni, Vasiliki; Lehmann, Marco P; Modirshanechi, Alireza; Brea, Johanni; Lutti, Antoine; Gerstner, Wulfram; Preuschoff, Kerstin.

Neuroimage ; 246: 118780, 2022 02 01.

Article in English | MEDLINE | ID: mdl-34875383

ABSTRACT

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.

Subject(s)

Brain/physiology , Decision Making/physiology , Functional Neuroimaging/methods , Learning/physiology , Magnetic Resonance Imaging/methods , Adult , Brain/diagnostic imaging , Female , Humans , Male , Models, Biological , Reinforcement, Psychology , Young Adult

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.

Xu, He A; Modirshanechi, Alireza; Lehmann, Marco P; Gerstner, Wulfram; Herzog, Michael H.

PLoS Comput Biol ; 17(6): e1009070, 2021 06.

Article in English | MEDLINE | ID: mdl-34081705

ABSTRACT

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.

Subject(s)

Adaptation, Psychological , Exploratory Behavior , Models, Psychological , Reinforcement, Psychology , Algorithms , Choice Behavior/physiology , Computational Biology , Decision Making/physiology , Electroencephalography/statistics & numerical data , Exploratory Behavior/physiology , Humans , Learning/physiology , Models, Neurological , Reward

One-shot learning and behavioral eligibility traces in sequential decision making.

Lehmann, Marco P; Xu, He A; Liakoni, Vasiliki; Herzog, Michael H; Gerstner, Wulfram; Preuschoff, Kerstin.

Elife ; 82019 11 11.

Article in English | MEDLINE | ID: mdl-31709980

ABSTRACT

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.

Subject(s)

Cognition/physiology , Decision Making/physiology , Learning/physiology , Memory/physiology , Pupil/physiology , Reinforcement, Psychology , Reward , Algorithms , Humans , Markov Chains , Models, Neurological , Psychomotor Performance/physiology

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL