Search | VHL Regional Portal

Dopamine mediates the bidirectional update of interval timing.

Jakob, Anthony M V; Mikhael, John G; Hamilos, Allison E; Assad, John A; Gershman, Samuel J.

Behav Neurosci ; 136(5): 445-452, 2022 Oct.

Article in English | MEDLINE | ID: mdl-36222637

ABSTRACT

The role of dopamine (DA) as a reward prediction error (RPE) signal in reinforcement learning (RL) tasks has been well-established over the past decades. Recent work has shown that the RPE interpretation can also account for the effects of DA on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a readout of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of RL and interval timing. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Subject(s)

Dopamine , Reinforcement, Psychology , Animals , Dopamine/physiology , Dopaminergic Neurons/physiology , Learning/physiology , Mice , Reward

The role of state uncertainty in the dynamics of dopamine.

Mikhael, John G; Kim, HyungGoo R; Uchida, Naoshige; Gershman, Samuel J.

Curr Biol ; 32(5): 1077-1087.e9, 2022 03 14.

Article in English | MEDLINE | ID: mdl-35114098

ABSTRACT

Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.

Subject(s)

Dopamine , Reward , Animals , Learning , Mice , Reinforcement, Psychology , Uncertainty

Impulsivity and risk-seeking as Bayesian inference under dopaminergic control.

Mikhael, John G; Gershman, Samuel J.

Neuropsychopharmacology ; 47(2): 465-476, 2022 01.

Article in English | MEDLINE | ID: mdl-34376813

ABSTRACT

Bayesian models successfully account for several of dopamine (DA)'s effects on contextual calibration in interval timing and reward estimation. In these models, tonic levels of DA control the precision of stimulus encoding, which is weighed against contextual information when making decisions. When DA levels are high, the animal relies more heavily on the (highly precise) stimulus encoding, whereas when DA levels are low, the context affects decisions more strongly. Here, we extend this idea to intertemporal choice and probability discounting tasks. In intertemporal choice tasks, agents must choose between a small reward delivered soon and a large reward delivered later, whereas in probability discounting tasks, agents must choose between a small reward that is always delivered and a large reward that may be omitted with some probability. Beginning with the principle that animals will seek to maximize their reward rates, we show that the Bayesian model predicts a number of curious empirical findings in both tasks. First, the model predicts that higher DA levels should normally promote selection of the larger/later option, which is often taken to imply that DA decreases 'impulsivity,' and promote selection of the large/risky option, often taken to imply that DA increases 'risk-seeking.' However, if the temporal precision is sufficiently decreased, higher DA levels should have the opposite effect-promoting selection of the smaller/sooner option (higher impulsivity) and the small/safe option (lower risk-seeking). Second, high enough levels of DA can result in preference reversals. Third, selectively decreasing the temporal precision, without manipulating DA, should promote selection of the larger/later and large/risky options. Fourth, when a different post-reward delay is associated with each option, animals will not learn the option-delay contingencies, but this learning can be salvaged when the post-reward delays are made more salient. Finally, the Bayesian model predicts correlations among behavioral phenotypes: Animals that are better timers will also appear less impulsive.

Subject(s)

Delay Discounting , Dopamine , Animals , Bayes Theorem , Choice Behavior , Impulsive Behavior , Reward

Rational inattention and tonic dopamine.

Mikhael, John G; Lai, Lucy; Gershman, Samuel J.

PLoS Comput Biol ; 17(3): e1008659, 2021 03.

Article in English | MEDLINE | ID: mdl-33760806

ABSTRACT

Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.

Subject(s)

Attention/physiology , Dopamine , Models, Neurological , Bayes Theorem , Cognition/physiology , Computational Biology , Dopamine/metabolism , Dopamine/physiology , Humans , Reinforcement, Psychology

A Unified Framework for Dopamine Signals across Timescales.

Kim, HyungGoo R; Malik, Athar N; Mikhael, John G; Bech, Pol; Tsutsui-Kimura, Iku; Sun, Fangmiao; Zhang, Yajun; Li, Yulong; Watabe-Uchida, Mitsuko; Gershman, Samuel J; Uchida, Naoshige.

Cell ; 183(6): 1600-1616.e25, 2020 12 10.

Article in English | MEDLINE | ID: mdl-33248024

ABSTRACT

Rapid phasic activity of midbrain dopamine neurons is thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. However, recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independent from somatic spiking activity. Here we developed experimental paradigms using virtual reality that disambiguate RPEs from values. We examined dopamine circuit activity at various stages, including somatic spiking, calcium signals at somata and axons, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all stages examined. Ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.

Subject(s)

Dopamine/metabolism , Signal Transduction , Action Potentials/physiology , Animals , Axons/metabolism , Calcium/metabolism , Calcium Signaling , Cell Body/metabolism , Cues , Dopaminergic Neurons/physiology , Fluorometry , Male , Mice, Inbred C57BL , Models, Neurological , Photic Stimulation , Reward , Sensation , Time Factors , Ventral Tegmental Area/metabolism , Virtual Reality

Adapting the flow of time with dopamine.

Mikhael, John G; Gershman, Samuel J.

J Neurophysiol ; 121(5): 1748-1760, 2019 05 01.

Article in English | MEDLINE | ID: mdl-30864882

ABSTRACT

The modulation of interval timing by dopamine (DA) has been well established over decades of research. The nature of this modulation, however, has remained controversial: Although the pharmacological evidence has largely suggested that time intervals are overestimated with higher DA levels, more recent optogenetic work has shown the opposite effect. In addition, a large body of work has asserted DA's role as a "reward prediction error" (RPE), or a teaching signal that allows the basal ganglia to learn to predict future rewards in reinforcement learning tasks. Whether these two seemingly disparate accounts of DA may be related has remained an open question. By taking a reinforcement learning-based approach to interval timing, we show here that the RPE interpretation of DA naturally extends to its role as a modulator of timekeeping and furthermore that this view reconciles the seemingly conflicting observations. We derive a biologically plausible, DA-dependent plasticity rule that can modulate the rate of timekeeping in either direction and whose effect depends on the timing of the DA signal itself. This bidirectional update rule can account for the results from pharmacology and optogenetics as well as the behavioral effects of reward rate on interval timing and the temporal selectivity of striatal neurons. Hence, by adopting a single RPE interpretation of DA, our results take a step toward unifying computational theories of reinforcement learning and interval timing. NEW & NOTEWORTHY How does dopamine (DA) influence interval timing? A large body of pharmacological evidence has suggested that DA accelerates timekeeping mechanisms. However, recent optogenetic work has shown exactly the opposite effect. In this article, we relate DA's role in timekeeping to its most established role, as a critical component of reinforcement learning. This allows us to derive a neurobiologically plausible framework that reconciles a large body of DA's temporal effects, including pharmacological, behavioral, electrophysiological, and optogenetic.

Subject(s)

Brain/physiology , Dopamine/metabolism , Models, Neurological , Reaction Time , Reward , Animals , Brain/metabolism , Mice , Neuronal Plasticity , Time Perception

Learning Reward Uncertainty in the Basal Ganglia.

Mikhael, John G; Bogacz, Rafal.

PLoS Comput Biol ; 12(9): e1005062, 2016 09.

Article in English | MEDLINE | ID: mdl-27589489

ABSTRACT

Learning the reliability of different sources of rewards is critical for making optimal choices. However, despite the existence of detailed theory describing how the expected reward is learned in the basal ganglia, it is not known how reward uncertainty is estimated in these circuits. This paper presents a class of models that encode both the mean reward and the spread of the rewards, the former in the difference between the synaptic weights of D1 and D2 neurons, and the latter in their sum. In the models, the tendency to seek (or avoid) options with variable reward can be controlled by increasing (or decreasing) the tonic level of dopamine. The models are consistent with the physiology of and synaptic plasticity in the basal ganglia, they explain the effects of dopaminergic manipulations on choices involving risks, and they make multiple experimental predictions.

Subject(s)

Basal Ganglia/physiology , Choice Behavior/physiology , Learning/physiology , Computational Biology , Humans , Models, Neurological , Reward , Uncertainty

Functional neuroanatomy of intuitive physical inference.

Fischer, Jason; Mikhael, John G; Tenenbaum, Joshua B; Kanwisher, Nancy.

Proc Natl Acad Sci U S A ; 113(34): E5072-81, 2016 08 23.

Article in English | MEDLINE | ID: mdl-27503892

ABSTRACT

To engage with the world-to understand the scene in front of us, plan actions, and predict what will happen next-we must have an intuitive grasp of the world's physical structure and dynamics. How do the objects in front of us rest on and support each other, how much force would be required to move them, and how will they behave when they fall, roll, or collide? Despite the centrality of physical inferences in daily life, little is known about the brain mechanisms recruited to interpret the physical structure of a scene and predict how physical events will unfold. Here, in a series of fMRI experiments, we identified a set of cortical regions that are selectively engaged when people watch and predict the unfolding of physical events-a "physics engine" in the brain. These brain regions are selective to physical inferences relative to nonphysical but otherwise highly similar scenes and tasks. However, these regions are not exclusively engaged in physical inferences per se or, indeed, even in scene understanding; they overlap with the domain-general "multiple demand" system, especially the parts of that system involved in action planning and tool use, pointing to a close relationship between the cognitive and neural mechanisms involved in parsing the physical content of a scene and preparing an appropriate action.

Subject(s)

Cognition/physiology , Intuition/physiology , Motor Cortex/physiology , Neuroanatomy/methods , Adolescent , Adult , Brain Mapping , Female , Humans , Magnetic Resonance Imaging , Male , Motor Cortex/anatomy & histology , Pattern Recognition, Visual/physiology , Photic Stimulation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL