Search | VHL Regional Portal

Rethinking dopamine-guided action sequence learning.

Song, Minryung R; Lee, Sang Wan.

Eur J Neurosci ; 2024 May 26.

Article in English | MEDLINE | ID: mdl-38798086

ABSTRACT

As opposed to those requiring a single action for reward acquisition, tasks necessitating action sequences demand that animals learn action elements and their sequential order and sustain the behaviour until the sequence is completed. With repeated learning, animals not only exhibit precise execution of these sequences but also demonstrate enhanced smoothness and efficiency. Previous research has demonstrated that midbrain dopamine and its major projection target, the striatum, play crucial roles in these processes. Recent studies have shown that dopamine from the substantia nigra pars compacta (SNc) and the ventral tegmental area (VTA) serve distinct functions in action sequence learning. The distinct contributions of dopamine also depend on the striatal subregions, namely the ventral, dorsomedial and dorsolateral striatum. Here, we have reviewed recent findings on the role of striatal dopamine in action sequence learning, with a focus on recent rodent studies.

Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity.

Song, Minryung R; Lee, Sang Wan.

Neural Netw ; 126: 95-107, 2020 Jun.

Article in English | MEDLINE | ID: mdl-32203877

ABSTRACT

For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially important stimuli, such as reward-predicting or novel stimuli, and allocate resources to these stimuli by modulating how an animal approaches, exploits, explores, and attends. The current study examined the theoretical possibility that dopamine activity reflects the dynamic allocation of resources for learning. Dopamine activity may transition between two patterns: (1) phasic responses to cues and rewards, and (2) ramping activity arising as the agent approaches the reward. Phasic excitation has been explained by prediction errors generated by experimentally inserted cues. However, when and why dopamine activity transitions between the two patterns remain unknown. By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared them with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent focuses its resources on a small number of reward-predicting stimuli, thus leading to task dimensionality reduction. The opposite occurs when the agent re-distributes its resources to adapt to environmental changes, resulting in task dimensionality expansion. This research elucidates the role of dopamine in a broader context, providing a potential explanation for the diverse repertoire of dopamine activity that cannot be explained solely by prediction error.

Subject(s)

Dopamine/metabolism , Models, Neurological , Reinforcement, Psychology , Animals , Cues , Humans

Value learning and arousal in the extinction of probabilistic rewards: the role of dopamine in a modified temporal difference model.

Song, Minryung R; Fellous, Jean-Marc.

PLoS One ; 9(2): e89494, 2014.

Article in English | MEDLINE | ID: mdl-24586823

ABSTRACT

Because most rewarding events are probabilistic and changing, the extinction of probabilistic rewards is important for survival. It has been proposed that the extinction of probabilistic rewards depends on arousal and the amount of learning of reward values. Midbrain dopamine neurons were suggested to play a role in both arousal and learning reward values. Despite extensive research on modeling dopaminergic activity in reward learning (e.g. temporal difference models), few studies have been done on modeling its role in arousal. Although temporal difference models capture key characteristics of dopaminergic activity during the extinction of deterministic rewards, they have been less successful at simulating the extinction of probabilistic rewards. By adding an arousal signal to a temporal difference model, we were able to simulate the extinction of probabilistic rewards and its dependence on the amount of learning. Our simulations propose that arousal allows the probability of reward to have lasting effects on the updating of reward value, which slows the extinction of low probability rewards. Using this model, we predicted that, by signaling the prediction error, dopamine determines the learned reward value that has to be extinguished during extinction and participates in regulating the size of the arousal signal that controls the learning rate. These predictions were supported by pharmacological experiments in rats.

Subject(s)

Arousal , Dopamine/physiology , Learning , Models, Biological , Probability , Reward , Ventral Tegmental Area/physiology , Animals , Rats , Rats, Sprague-Dawley , Ventral Tegmental Area/cytology

Diversity and homogeneity in responses of midbrain dopamine neurons.

Fiorillo, Christopher D; Yun, Sora R; Song, Minryung R.

J Neurosci ; 33(11): 4693-709, 2013 Mar 13.

Article in English | MEDLINE | ID: mdl-23486943

ABSTRACT

Dopamine neurons of the ventral midbrain have been found to signal a reward prediction error that can mediate positive reinforcement. Despite the demonstration of modest diversity at the cellular and molecular levels, there has been little analysis of response diversity in behaving animals. Here we examine response diversity in rhesus macaques to appetitive, aversive, and neutral stimuli having relative motivational values that were measured and controlled through a choice task. First, consistent with previous studies, we observed a continuum of response variability and an apparent absence of distinct clusters in scatter plots, suggesting a lack of statistically discrete subpopulations of neurons. Second, we found that a group of "sensitive" neurons tend to be more strongly suppressed by a variety of stimuli and to be more strongly activated by juice. Third, neurons in the "ventral tier" of substantia nigra were found to have greater suppression, and a subset of these had higher baseline firing rates and late "rebound" activation after suppression. These neurons could belong to a previously identified subgroup of dopamine neurons that express high levels of H-type cation channels but lack calbindin. Fourth, neurons further rostral exhibited greater suppression. Fifth, although we observed weak activation of some neurons by aversive stimuli, this was not associated with their aversiveness. In conclusion, we find a diversity of response properties, distributed along a continuum, within what may be a single functional population of neurons signaling reward prediction error.

Subject(s)

Action Potentials/physiology , Brain Mapping , Dopaminergic Neurons/classification , Dopaminergic Neurons/physiology , Mesencephalon/cytology , Neural Inhibition/physiology , Animals , Appetitive Behavior , Avoidance Learning/physiology , Choice Behavior/physiology , Conditioning, Operant , Female , Macaca mulatta , Male , Motivation/physiology , Photic Stimulation , Reaction Time/physiology , Reinforcement, Psychology , Reward , Statistics as Topic

Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli.

Fiorillo, Christopher D; Song, Minryung R; Yun, Sora R.

J Neurosci ; 33(11): 4710-25, 2013 Mar 13.

Article in English | MEDLINE | ID: mdl-23486944

ABSTRACT

The transient response of dopamine neurons has been described as reward prediction error (RPE), with activation or suppression by events that are better or worse than expected, respectively. However, at least a minority of neurons are activated by aversive or high-intensity stimuli, casting doubt on the generality of RPE in describing the dopamine signal. To overcome limitations of previous studies, we studied neuronal responses to a wider variety of high-intensity and aversive stimuli, and we quantified and controlled aversiveness through a choice task in which macaques sacrificed juice to avoid aversive stimuli. Whereas most previous work has portrayed the RPE as a single impulse or "phase," here we demonstrate its multiphasic temporal dynamics. Aversive or high-intensity stimuli evoked a triphasic sequence of activation-suppression-activation extending over a period of 40-700 ms. The initial activation at short latencies (40-120 ms) reflected sensory intensity. The influence of motivational value became dominant between 150 and 250 ms, with activation in the case of appetitive stimuli, and suppression in the case of aversive and neutral stimuli. The previously unreported late activation appeared to be a modest "rebound" after strong suppression. Similarly, strong activation by reward was often followed by suppression. We suggest that these "rebounds" may result from overcompensation by homeostatic mechanisms in some cells. Our results are consistent with a realistic RPE, which evolves over time through a dynamic balance of excitation and inhibition.

Subject(s)

Appetitive Behavior/physiology , Avoidance Learning/physiology , Dopaminergic Neurons/physiology , Mesencephalon/cytology , Motivation/physiology , Acoustic Stimulation , Action Potentials/physiology , Animals , Choice Behavior/physiology , Conditioning, Classical , Female , Judgment , Macaca mulatta , Male , Mesencephalon/physiology , Neural Inhibition/physiology , Nonlinear Dynamics , Reaction Time/physiology , Regression Analysis , Reinforcement, Psychology , Time Factors

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL