Pesquisa | Portal Regional da BVS

1.

A novel behavioral paradigm using mice to study predictive postural control.

Doi, Yurika; Asaka, Meiko; Born, Richard T; Yanagihara, Dai; Uchida, Naoshige.

bioRxiv ; 2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-39005260

RESUMO

Postural control circuitry performs the essential function of maintaining balance and body position in response to perturbations that are either self-generated (e.g. reaching to pick up an object) or externally delivered (e.g. being pushed by another person). Human studies have shown that anticipation of predictable postural disturbances can modulate such responses. This indicates that postural control could involve higher-level neural structures associated with predictive functions, rather than being purely reactive. However, the underlying neural circuitry remains largely unknown. To enable studies of predictive postural control circuits, we developed a novel task for mice . In this task, modeled after human studies, a dynamic platform generated reproducible translational perturbations. While mice stood bipedally atop a perch to receive water rewards, they experienced backward translations that were either unpredictable or preceded by an auditory cue. To validate the task, we investigated the effect of the auditory cue on postural responses to perturbations across multiple days in three mice. These preliminary results serve to validate a new postural control model, opening the door to the types of neural recordings and circuit manipulations that are currently possible only in mice. Significance Statement: The ability to anticipate disturbances and adjust one's posture accordingly-known as "predictive postural control"-is crucial for preventing falls and for advancing robotics. Human postural studies often face limitations with measurement tools and sample sizes, hindering insight into underlying neural mechanisms. To address these limitations, we developed a postural perturbation task for freely moving mice, modeled after those used in human studies. Using a dynamic platform, we delivered reproducible perturbations with or without preceding auditory cues and quantified how the auditory cue affects postural responses to perturbations. Our work provides validation of a new postural control model, which opens the door to the types of neural population recordings and circuit manipulation that are currently possible only in mice.

2.

The role of prospective contingency in the control of behavior and dopamine signals during associative learning.

Qian, Lechen; Burrell, Mark; Hennig, Jay A; Matias, Sara; Murthy, Venkatesh N; Gershman, Samuel J; Uchida, Naoshige.

bioRxiv ; 2024 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-38370735

RESUMO

Associative learning depends on contingency, the degree to which a stimulus predicts an outcome. Despite its importance, the neural mechanisms linking contingency to behavior remain elusive. Here we examined the dopamine activity in the ventral striatum - a signal implicated in associative learning - in a Pavlovian contingency degradation task in mice. We show that both anticipatory licking and dopamine responses to a conditioned stimulus decreased when additional rewards were delivered uncued, but remained unchanged if additional rewards were cued. These results conflict with contingency-based accounts using a traditional definition of contingency or a novel causal learning model (ANCCR), but can be explained by temporal difference (TD) learning models equipped with an appropriate inter-trial-interval (ITI) state representation. Recurrent neural networks trained within a TD framework develop state representations like our best 'handcrafted' model. Our findings suggest that the TD error can be a measure that describes both contingency and dopaminergic activity.

3.

Glutamate inputs send prediction error of reward, but not negative value of aversive stimuli, to dopamine neurons.

Amo, Ryunosuke; Uchida, Naoshige; Watabe-Uchida, Mitsuko.

Neuron ; 112(6): 1001-1019.e6, 2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-38278147

RESUMO

Midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), but the mechanisms underlying RPE computation, particularly the contributions of different neurotransmitters, remain poorly understood. Here, we used a genetically encoded glutamate sensor to examine the pattern of glutamate inputs to dopamine neurons in mice. We found that glutamate inputs exhibit virtually all of the characteristics of RPE rather than conveying a specific component of RPE computation, such as reward or expectation. Notably, whereas glutamate inputs were transiently inhibited by reward omission, they were excited by aversive stimuli. Opioid analgesics altered dopamine negative responses to aversive stimuli into more positive responses, whereas excitatory responses of glutamate inputs remained unchanged. Our findings uncover previously unknown synaptic mechanisms underlying RPE computations; dopamine responses are shaped by both synergistic and competitive interactions between glutamatergic and GABAergic inputs to dopamine neurons depending on valences, with competitive interactions playing a role in responses to aversive stimuli.

Assuntos

Neurônios Dopaminérgicos , Ácido Glutâmico , Camundongos , Animais , Neurônios Dopaminérgicos/fisiologia , Dopamina/fisiologia , Recompensa , Mesencéfalo , Área Tegmentar Ventral/fisiologia

4.

An opponent striatal circuit for distributional reinforcement learning.

Lowet, Adam S; Zheng, Qiao; Meng, Melissa; Matias, Sara; Drugowitsch, Jan; Uchida, Naoshige.

bioRxiv ; 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38260354

RESUMO

Machine learning research has achieved large performance gains on a wide range of tasks by expanding the learning target from mean rewards to entire probability distributions of rewards - an approach known as distributional reinforcement learning (RL)1. The mesolimbic dopamine system is thought to underlie RL in the mammalian brain by updating a representation of mean value in the striatum2,3, but little is known about whether, where, and how neurons in this circuit encode information about higher-order moments of reward distributions4. To fill this gap, we used high-density probes (Neuropixels) to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task in which reward mean, reward variance, and stimulus identity were independently manipulated. In contrast to traditional RL accounts, we found robust evidence for abstract encoding of variance in the striatum. Remarkably, chronic ablation of dopamine inputs disorganized these distributional representations in the striatum without interfering with mean value coding. Two-photon calcium imaging and optogenetics revealed that the two major classes of striatal medium spiny neurons - D1 and D2 MSNs - contributed to this code by preferentially encoding the right and left tails of the reward distribution, respectively. We synthesize these findings into a new model of the striatum and mesolimbic dopamine that harnesses the opponency between D1 and D2 MSNs5-15 to reap the computational benefits of distributional RL.

5.

Interpretable deep learning for deconvolutional analysis of neural signals.

Tolooshams, Bahareh; Matias, Sara; Wu, Hao; Temereanca, Simona; Uchida, Naoshige; Murthy, Venkatesh N; Masset, Paul; Ba, Demba.

bioRxiv ; 2024 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-38260512

RESUMO

The widespread adoption of deep learning to build models that capture the dynamics of neural populations is typically based on "black-box" approaches that lack an interpretable link between neural activity and function. Here, we propose to apply algorithm unrolling, a method for interpretable deep learning, to design the architecture of sparse deconvolutional neural networks and obtain a direct interpretation of network weights in relation to stimulus-driven single-neuron activity through a generative model. We characterize our method, referred to as deconvolutional unrolled neural learning (DUNL), and show its versatility by applying it to deconvolve single-trial local signals across multiple brain areas and recording modalities. To exemplify use cases of our decomposition method, we uncover multiplexed salience and reward prediction error signals from midbrain dopamine neurons in an unbiased manner, perform simultaneous event detection and characterization in somatosensory thalamus recordings, and characterize the responses of neurons in the piriform cortex. Our work leverages the advances in interpretable deep learning to gain a mechanistic understanding of neural dynamics.

6.

Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model.

Pinto, Sandra Romero; Uchida, Naoshige.

bioRxiv ; 2023 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-38014087

RESUMO

A hallmark of various psychiatric disorders is biased future predictions. Here we examined the mechanisms for biased value learning using reinforcement learning models incorporating recent findings on synaptic plasticity and opponent circuit mechanisms in the basal ganglia. We show that variations in tonic dopamine can alter the balance between learning from positive and negative reward prediction errors, leading to biased value predictions. This bias arises from the sigmoidal shapes of the dose-occupancy curves and distinct affinities of D1- and D2-type dopamine receptors: changes in tonic dopamine differentially alters the slope of the dose-occupancy curves of these receptors, thus sensitivities, at baseline dopamine concentrations. We show that this mechanism can explain biased value learning in both mice and humans and may also contribute to symptoms observed in psychiatric disorders. Our model provides a foundation for understanding the basal ganglia circuit and underscores the significance of tonic dopamine in modulating learning processes.

7.

Multi-timescale reinforcement learning in the brain.

Masset, Paul; Tano, Pablo; Kim, HyungGoo R; Malik, Athar N; Pouget, Alexandre; Uchida, Naoshige.

bioRxiv ; 2023 Nov 14.

Artigo em Inglês | MEDLINE | ID: mdl-38014166

RESUMO

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.

8.

Glutamate inputs send prediction error of reward but not negative value of aversive stimuli to dopamine neurons.

Amo, Ryunosuke; Uchida, Naoshige; Watabe-Uchida, Mitsuko.

bioRxiv ; 2023 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-37986868

RESUMO

Midbrain dopamine neurons are thought to signal reward prediction errors (RPEs) but the mechanisms underlying RPE computation, particularly contributions of different neurotransmitters, remain poorly understood. Here we used a genetically-encoded glutamate sensor to examine the pattern of glutamate inputs to dopamine neurons. We found that glutamate inputs exhibit virtually all of the characteristics of RPE, rather than conveying a specific component of RPE computation such as reward or expectation. Notably, while glutamate inputs were transiently inhibited by reward omission, they were excited by aversive stimuli. Opioid analgesics altered dopamine negative responses to aversive stimuli toward more positive responses, while excitatory responses of glutamate inputs remained unchanged. Our findings uncover previously unknown synaptic mechanisms underlying RPE computations; dopamine responses are shaped by both synergistic and competitive interactions between glutamatergic and GABAergic inputs to dopamine neurons depending on valences, with competitive interactions playing a role in responses to aversive stimuli.

9.

Emergence of belief-like representations through reinforcement learning.

Hennig, Jay A; Romero Pinto, Sandra A; Yamaguchi, Takahiro; Linderman, Scott W; Uchida, Naoshige; Gershman, Samuel J.

PLoS Comput Biol ; 19(9): e1011067, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37695776

RESUMO

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.

Assuntos

Aprendizagem , Reforço Psicológico , Animais , Teorema de Bayes , Recompensa , Redes Neurais de Computação

10.

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics.

Bukwich, Michael; Campbell, Malcolm G; Zoltowski, David; Kingsbury, Lyle; Tomov, Momchil S; Stern, Joshua; Kim, HyungGoo R; Drugowitsch, Jan; Linderman, Scott W; Uchida, Naoshige.

bioRxiv ; 2023 Sep 05.

Artigo em Inglês | MEDLINE | ID: mdl-37732217

RESUMO

The ability to make advantageous decisions is critical for animals to ensure their survival. Patch foraging is a natural decision-making process in which animals decide when to leave a patch of depleting resources to search for a new one. To study the algorithmic and neural basis of patch foraging behavior in a controlled laboratory setting, we developed a virtual foraging task for head-fixed mice. Mouse behavior could be explained by ramp-to-threshold models integrating time and rewards antagonistically. Accurate behavioral modeling required inclusion of a slowly varying "patience" variable, which modulated sensitivity to time. To investigate the neural basis of this decision-making process, we performed dense electrophysiological recordings with Neuropixels probes broadly throughout frontal cortex and underlying subcortical areas. We found that decision variables from the reward integrator model were represented in neural activity, most robustly in frontal cortical areas. Regression modeling followed by unsupervised clustering identified a subset of neurons with ramping activity. These neurons' firing rates ramped up gradually in single trials over long time scales (up to tens of seconds), were inhibited by rewards, and were better described as being generated by a continuous ramp rather than a discrete stepping process. Together, these results identify reward integration via a continuous ramping process in frontal cortex as a likely candidate for the mechanism by which the mammalian brain solves patch foraging problems.

11.

A Hypothalamic Circuit Underlying the Dynamic Control of Social Homeostasis.

Liu, Ding; Rahman, Mostafizur; Johnson, Autumn; Tsutsui-Kimura, Iku; Pena, Nicolai; Talay, Mustafa; Logeman, Brandon L; Finkbeiner, Samantha; Choi, Seungwon; Capo-Battaglia, Athena; Abdus-Saboor, Ishmail; Ginty, David D; Uchida, Naoshige; Watabe-Uchida, Mitsuko; Dulac, Catherine.

bioRxiv ; 2023 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-37293031

RESUMO

Social grouping increases survival in many species, including humans1,2. By contrast, social isolation generates an aversive state (loneliness) that motivates social seeking and heightens social interaction upon reunion3-5. The observed rebound in social interaction triggered by isolation suggests a homeostatic process underlying the control of social drive, similar to that observed for physiological needs such as hunger, thirst or sleep3,6. In this study, we assessed social responses in multiple mouse strains and identified the FVB/NJ line as exquisitely sensitive to social isolation. Using FVB/NJ mice, we uncovered two previously uncharacterized neuronal populations in the hypothalamic preoptic nucleus that are activated during social isolation and social rebound and that orchestrate the behavior display of social need and social satiety, respectively. We identified direct connectivity between these two populations of opposite function and with brain areas associated with social behavior, emotional state, reward, and physiological needs, and showed that animals require touch to assess the presence of others and fulfill their social need, thus revealing a brain-wide neural system underlying social homeostasis. These findings offer mechanistic insight into the nature and function of circuits controlling instinctive social need and for the understanding of healthy and diseased brain states associated with social context.

12.

Emergence of belief-like representations through reinforcement learning.

Hennig, Jay A; Pinto, Sandra A Romero; Yamaguchi, Takahiro; Linderman, Scott W; Uchida, Naoshige; Gershman, Samuel J.

bioRxiv ; 2023 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-37066383

RESUMO

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity. Author Summary: Natural environments are full of uncertainty. For example, just because my fridge had food in it yesterday does not mean it will have food today. Despite such uncertainty, animals can estimate which states and actions are the most valuable. Previous work suggests that animals estimate value using a brain area called the basal ganglia, using a process resembling a reinforcement learning algorithm called TD learning. However, traditional reinforcement learning algorithms cannot accurately estimate value in environments with state uncertainty (e.g., when my fridge's contents are unknown). One way around this problem is if agents form "beliefs," a probabilistic estimate of how likely each state is, given any observations so far. However, estimating beliefs is a demanding process that may not be possible for animals in more complex environments. Here we show that an artificial recurrent neural network (RNN) trained with TD learning can estimate value from observations, without explicitly estimating beliefs. The trained RNN's error signals resembled the neural activity of dopamine neurons measured during the same task. Importantly, the RNN's activity resembled beliefs, but only when the RNN had enough capacity. This work illustrates how animals could estimate value in uncertain environments without needing to first form beliefs, which may be useful in environments where computing the true beliefs is too costly.

13.

Spontaneous behaviour is structured by reinforcement without explicit reward.

Markowitz, Jeffrey E; Gillis, Winthrop F; Jay, Maya; Wood, Jeffrey; Harris, Ryley W; Cieszkowski, Robert; Scott, Rebecca; Brann, David; Koveal, Dorothy; Kula, Tomasz; Weinreb, Caleb; Osman, Mohammed Abdal Monium; Pinto, Sandra Romero; Uchida, Naoshige; Linderman, Scott W; Sabatini, Bernardo L; Datta, Sandeep Robert.

Nature ; 614(7946): 108-117, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36653449

RESUMO

Spontaneous animal behaviour is built from action modules that are concatenated by the brain into sequences1,2. However, the neural mechanisms that guide the composition of naturalistic, self-motivated behaviour remain unknown. Here we show that dopamine systematically fluctuates in the dorsolateral striatum (DLS) as mice spontaneously express sub-second behavioural modules, despite the absence of task structure, sensory cues or exogenous reward. Photometric recordings and calibrated closed-loop optogenetic manipulations during open field behaviour demonstrate that DLS dopamine fluctuations increase sequence variation over seconds, reinforce the use of associated behavioural modules over minutes, and modulate the vigour with which modules are expressed, without directly influencing movement initiation or moment-to-moment kinematics. Although the reinforcing effects of optogenetic DLS dopamine manipulations vary across behavioural modules and individual mice, these differences are well predicted by observed variation in the relationships between endogenous dopamine and module use. Consistent with the possibility that DLS dopamine fluctuations act as a teaching signal, mice build sequences during exploration as if to maximize dopamine. Together, these findings suggest a model in which the same circuits and computations that govern action choices in structured tasks have a key role in sculpting the content of unconstrained, high-dimensional, spontaneous behaviour.

Assuntos

Comportamento Animal , Reforço Psicológico , Recompensa , Animais , Camundongos , Corpo Estriado/metabolismo , Dopamina/metabolismo , Sinais (Psicologia) , Optogenética , Fotometria

14.

Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction.

Akiti, Korleki; Tsutsui-Kimura, Iku; Xie, Yudi; Mathis, Alexander; Markowitz, Jeffrey E; Anyoha, Rockwell; Datta, Sandeep Robert; Mathis, Mackenzie Weygandt; Uchida, Naoshige; Watabe-Uchida, Mitsuko.

Neuron ; 110(22): 3789-3804.e9, 2022 11 16.

Artigo em Inglês | MEDLINE | ID: mdl-36130595

RESUMO

Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to "shaping bonus") and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.

Assuntos

Corpo Estriado , Dopamina , Animais , Camundongos , Dopamina/fisiologia , Corpo Estriado/fisiologia , Reforço Psicológico , Recompensa , Aprendizagem/fisiologia

15.

A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning.

Amo, Ryunosuke; Matias, Sara; Yamanaka, Akihiro; Tanaka, Kenji F; Uchida, Naoshige; Watabe-Uchida, Mitsuko.

Nat Neurosci ; 25(8): 1082-1092, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35798979

RESUMO

A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.

Assuntos

Dopamina , Recompensa , Animais , Sinais (Psicologia) , Dopamina/fisiologia , Neurônios Dopaminérgicos/fisiologia , Aprendizado de Máquina , Mesencéfalo , Camundongos

16.

The role of state uncertainty in the dynamics of dopamine.

Mikhael, John G; Kim, HyungGoo R; Uchida, Naoshige; Gershman, Samuel J.

Curr Biol ; 32(5): 1077-1087.e9, 2022 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-35114098

RESUMO

Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a "bump," whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis.

Assuntos

Dopamina , Recompensa , Animais , Aprendizagem , Camundongos , Reforço Psicológico , Incerteza

17.

Dopamine signals as temporal difference errors: recent advances.

Starkweather, Clara Kwon; Uchida, Naoshige.

Curr Opin Neurobiol ; 67: 95-105, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33186815

RESUMO

In the brain, dopamine is thought to drive reward-based learning by signaling temporal difference reward prediction errors (TD errors), a 'teaching signal' used to train computers. Recent studies using optogenetic manipulations have provided multiple pieces of evidence supporting that phasic dopamine signals function as TD errors. Furthermore, novel experimental results have indicated that when the current state of the environment is uncertain, dopamine neurons compute TD errors using 'belief states' or a probability distribution over potential states. It remains unclear how belief states are computed but emerging evidence suggests involvement of the prefrontal cortex and the hippocampus. These results refine our understanding of the role of dopamine in learning and the algorithms by which dopamine functions in the brain.

Assuntos

Dopamina , Recompensa , Encéfalo , Neurônios Dopaminérgicos , Aprendizagem

18.

Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task.

Tsutsui-Kimura, Iku; Matsumoto, Hideyuki; Akiti, Korleki; Yamada, Melissa M; Uchida, Naoshige; Watabe-Uchida, Mitsuko.

Elife ; 92020 12 21.

Artigo em Inglês | MEDLINE | ID: mdl-33345774

RESUMO

Different regions of the striatum regulate different types of behavior. However, how dopamine signals differ across striatal regions and how dopamine regulates different behaviors remain unclear. Here, we compared dopamine axon activity in the ventral, dorsomedial, and dorsolateral striatum, while mice performed a perceptual and value-based decision task. Surprisingly, dopamine axon activity was similar across all three areas. At a glance, the activity multiplexed different variables such as stimulus-associated values, confidence, and reward feedback at different phases of the task. Our modeling demonstrates, however, that these modulations can be inclusively explained by moment-by-moment changes in the expected reward, that is the temporal difference error. A major difference between areas was the overall activity level of reward responses: reward responses in dorsolateral striatum were positively shifted, lacking inhibitory responses to negative prediction errors. The differences in dopamine signals put specific constraints on the properties of behaviors controlled by dopamine in these regions.

Assuntos

Axônios/fisiologia , Corpo Estriado/fisiologia , Tomada de Decisões/fisiologia , Neurônios Dopaminérgicos/fisiologia , Animais , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Odorantes , Reforço Psicológico , Recompensa , Olfato

19.

A Unified Framework for Dopamine Signals across Timescales.

Kim, HyungGoo R; Malik, Athar N; Mikhael, John G; Bech, Pol; Tsutsui-Kimura, Iku; Sun, Fangmiao; Zhang, Yajun; Li, Yulong; Watabe-Uchida, Mitsuko; Gershman, Samuel J; Uchida, Naoshige.

Cell ; 183(6): 1600-1616.e25, 2020 12 10.

Artigo em Inglês | MEDLINE | ID: mdl-33248024

RESUMO

Rapid phasic activity of midbrain dopamine neurons is thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. However, recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independent from somatic spiking activity. Here we developed experimental paradigms using virtual reality that disambiguate RPEs from values. We examined dopamine circuit activity at various stages, including somatic spiking, calcium signals at somata and axons, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all stages examined. Ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.

Assuntos

Dopamina/metabolismo , Transdução de Sinais , Potenciais de Ação/fisiologia , Animais , Axônios/metabolismo , Cálcio/metabolismo , Sinalização do Cálcio , Corpo Celular/metabolismo , Sinais (Psicologia) , Neurônios Dopaminérgicos/fisiologia , Fluorometria , Masculino , Camundongos Endogâmicos C57BL , Modelos Neurológicos , Estimulação Luminosa , Recompensa , Sensação , Fatores de Tempo , Área Tegmentar Ventral/metabolismo , Realidade Virtual

20.

Distributional Reinforcement Learning in the Brain.

Lowet, Adam S; Zheng, Qiao; Matias, Sara; Drugowitsch, Jan; Uchida, Naoshige.

Trends Neurosci ; 43(12): 980-997, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-33092893

RESUMO

Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the mammalian midbrain and the reward prediction errors of reinforcement learning algorithms, which express the difference between actual reward and predicted mean reward. However, it may be advantageous to learn not only the mean but also the complete distribution of potential rewards. Recent advances in machine learning have revealed a biologically plausible set of algorithms for reconstructing this reward distribution from experience. Here, we review the mathematical foundations of these algorithms as well as initial evidence for their neurobiological implementation. We conclude by highlighting outstanding questions regarding the circuit computation and behavioral readout of these distributional codes.

Assuntos

Dopamina , Reforço Psicológico , Animais , Encéfalo , Humanos , Mesencéfalo , Recompensa

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA