Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 17.797
Filter
1.
Article in English | MEDLINE | ID: mdl-38829754

ABSTRACT

Steady-state visual evoked potential (SSVEP) is one of the most used brain-computer interface (BCI) paradigms. Conventional methods analyze SSVEPs at a fixed window length. Compared with these methods, dynamic window methods can achieve a higher information transfer rate (ITR) by selecting an appropriate window length. These methods dynamically evaluate the credibility of the result by linear discriminant analysis (LDA) or Bayesian estimation and extend the window length until credible results are obtained. However, the hypotheses introduced by LDA and Bayesian estimation may not align with the collected real-world SSVEPs, which leads to an inappropriate window length. To address the issue, we propose a novel dynamic window method based on reinforcement learning (RL). The proposed method optimizes the decision of whether to extend the window length based on the impact of decisions on the ITR, without additional hypotheses. The decision model can automatically learn a strategy that maximizes the ITR through trial and error. In addition, compared with traditional methods that manually extract features, the proposed method uses neural networks to automatically extract features for the dynamic selection of window length. Therefore, the proposed method can more accurately decide whether to extend the window length and select an appropriate window length. To verify the performance, we compared the novel method with other dynamic window methods on two public SSVEP datasets. The experimental results demonstrate that the novel method achieves the highest performance by using RL.


Subject(s)
Algorithms , Bayes Theorem , Brain-Computer Interfaces , Electroencephalography , Evoked Potentials, Visual , Neural Networks, Computer , Reinforcement, Psychology , Humans , Evoked Potentials, Visual/physiology , Electroencephalography/methods , Discriminant Analysis , Male , Adult , Young Adult , Female , Machine Learning
2.
Addict Biol ; 29(5): e13397, 2024 May.
Article in English | MEDLINE | ID: mdl-38711205

ABSTRACT

Neuronal ensembles in the medial prefrontal cortex mediate cocaine self-administration via projections to the nucleus accumbens. We have recently shown that neuronal ensembles in the prelimbic cortex form rapidly to mediate cocaine self-administration. However, the role of neuronal ensembles within the nucleus accumbens in initial cocaine-seeking behaviour remains unknown. Here, we sought to expand the current literature by testing the necessity of the cocaine self-administration ensemble in the nucleus accumbens core (NAcCore) 1 day after male and female rats acquire cocaine self-administration by using the Daun02 inactivation procedure. We found that disrupting the NAcCore ensembles after a no-cocaine reward-seeking test increased subsequent cocaine seeking, while disrupting NAcCore ensembles following a cocaine self-administration session decreased subsequent cocaine seeking. We then characterized neuronal cell type in the NAcCore using RNAscope in situ hybridization. In the no-cocaine session, we saw reduced dopamine D1 type neuronal activation, while in the cocaine self-administration session, we found preferential dopamine D1 type neuronal activity in the NAcCore.


Subject(s)
Cocaine , Drug-Seeking Behavior , Neurons , Nucleus Accumbens , Self Administration , Animals , Nucleus Accumbens/drug effects , Cocaine/pharmacology , Male , Female , Rats , Drug-Seeking Behavior/drug effects , Neurons/drug effects , Reward , Dopamine Uptake Inhibitors/pharmacology , Reinforcement, Psychology , Receptors, Dopamine D1 , Cocaine-Related Disorders/physiopathology , Rats, Sprague-Dawley , Prefrontal Cortex/drug effects
3.
Elife ; 132024 May 07.
Article in English | MEDLINE | ID: mdl-38711355

ABSTRACT

Collaborative hunting, in which predators play different and complementary roles to capture prey, has been traditionally believed to be an advanced hunting strategy requiring large brains that involve high-level cognition. However, recent findings that collaborative hunting has also been documented in smaller-brained vertebrates have placed this previous belief under strain. Here, using computational multi-agent simulations based on deep reinforcement learning, we demonstrate that decisions underlying collaborative hunts do not necessarily rely on sophisticated cognitive processes. We found that apparently elaborate coordination can be achieved through a relatively simple decision process of mapping between states and actions related to distance-dependent internal representations formed by prior experience. Furthermore, we confirmed that this decision rule of predators is robust against unknown prey controlled by humans. Our computational ecological results emphasize that collaborative hunting can emerge in various intra- and inter-specific interactions in nature, and provide insights into the evolution of sociality.


From wolves to ants, many animals are known to be able to hunt as a team. This strategy may yield several advantages: going after bigger preys together, for example, can often result in individuals spending less energy and accessing larger food portions than when hunting alone. However, it remains unclear whether this behavior relies on complex cognitive processes, such as the ability for an animal to represent and anticipate the actions of its teammates. It is often thought that 'collaborative hunting' may require such skills, as this form of group hunting involves animals taking on distinct, tightly coordinated roles ­ as opposed to simply engaging in the same actions simultaneously. To better understand whether high-level cognitive skills are required for collaborative hunting, Tsutsui et al. used a type of artificial intelligence known as deep reinforcement learning. This allowed them to develop a computational model in which a small number of 'agents' had the opportunity to 'learn' whether and how to work together to catch a 'prey' under various conditions. To do so, the agents were only equipped with the ability to link distinct stimuli together, such as an event and a reward; this is similar to associative learning, a cognitive process which is widespread amongst animal species. The model showed that the challenge of capturing the prey when hunting alone, and the reward of sharing food after a successful hunt drove the agents to learn how to work together, with previous experiences shaping decisions made during subsequent hunts. Importantly, the predators started to exhibit the ability to take on distinct, complementary roles reminiscent of those observed during collaborative hunting, such as one agent chasing the prey while another ambushes it. Overall, the work by Tsutsui et al. challenges the traditional view that only organisms equipped with high-level cognitive processes can show refined collaborative approaches to hunting, opening the possibility that these behaviors may be more widespread than originally thought ­ including between animals of different species.


Subject(s)
Deep Learning , Predatory Behavior , Reinforcement, Psychology , Animals , Cooperative Behavior , Humans , Computer Simulation , Decision Making
4.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38717856

ABSTRACT

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Subject(s)
Dopamine , Dopaminergic Neurons , Reward , Animals , Dopamine/metabolism , Dopaminergic Neurons/physiology , Dopaminergic Neurons/metabolism , Humans , Reinforcement, Psychology
5.
Am Nat ; 203(6): 695-712, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38781528

ABSTRACT

AbstractA change to a population's social network is a change to the substrate of cultural transmission, affecting behavioral diversity and adaptive cultural evolution. While features of network structure such as population size and density have been well studied, less is understood about the influence of social processes such as population turnover-or the repeated replacement of individuals by naive individuals. Experimental data have led to the hypothesis that naive learners can drive cultural evolution by better assessing the relative value of behaviors, although this hypothesis has been expressed only verbally. We conducted a formal exploration of this hypothesis using a generative model that concurrently simulated its two key ingredients: social transmission and reinforcement learning. We simulated competition between high- and low-reward behaviors while varying turnover magnitude and tempo. Variation in turnover influenced changes in the distributions of cultural behaviors, irrespective of initial knowledge-state conditions. We found optimal turnover regimes that amplified the production of higher reward behaviors through two key mechanisms: repertoire composition and enhanced valuation by agents that knew both behaviors. These effects depended on network and learning parameters. Our model provides formal theoretical support for, and predictions about, the hypothesis that naive learners can shape cultural change through their enhanced sampling ability. By moving from experimental data to theory, we illuminate an underdiscussed generative process that can lead to changes in cultural behavior, arising from an interaction between social dynamics and learning.


Subject(s)
Cultural Evolution , Learning , Humans , Reward , Social Behavior , Models, Theoretical , Reinforcement, Psychology
6.
PLoS One ; 19(5): e0303949, 2024.
Article in English | MEDLINE | ID: mdl-38805441

ABSTRACT

Cognitive rehabilitation, STEM (science, technology, engineering, and math) skill acquisition, and coaching games such as chess often require tutoring decision-making strategies. The advancement of AI-driven tutoring systems for facilitating human learning requires an understanding of the impact of evaluative feedback on human decision-making and skill development. To this end, we conduct human experiments using Amazon Mechanical Turk to study the influence of evaluative feedback on human decision-making in sequential tasks. In these experiments, participants solve the Tower of Hanoi puzzle and receive AI-generated feedback while solving it. We examine how this feedback affects their learning and skill transfer to related tasks. Additionally, treating humans as noisy optimal agents, we employ maximum entropy inverse reinforcement learning to analyze the effect of feedback on the implicit human reward structure that guides their decision making. Lastly, we explore various computational models to understand how people incorporate evaluative feedback into their decision-making processes. Our findings underscore that humans perceive evaluative feedback as indicative of their long-term strategic success, thus aiding in skill acquisition and transfer in sequential decision-making tasks. Moreover, we demonstrate that evaluative feedback fosters a more structured and organized learning experience compared to learning without feedback. Furthermore, our results indicate that providing intermediate goals alone does not significantly enhance human learning outcomes.


Subject(s)
Decision Making , Learning , Humans , Learning/physiology , Male , Female , Adult , Young Adult , Artificial Intelligence , Reinforcement, Psychology
7.
Sci Adv ; 10(22): eadn4203, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38809978

ABSTRACT

Learning causal relationships relies on understanding how often one event precedes another. To investigate how dopamine neuron activity and neurotransmitter release change when a retrospective relationship is degraded for a specific pair of events, we used outcome-selective Pavlovian contingency degradation in rats. Conditioned responding was attenuated for the cue-reward contingency that was degraded, as was dopamine neuron activity in the midbrain and dopamine release in the ventral striatum in response to the cue and subsequent reward. Contingency degradation also abolished the trial-by-trial history dependence of the dopamine responses at the time of trial outcome. This profile of changes in cue- and reward-evoked responding is not easily explained by a standard reinforcement learning model. An alternative model based on learning causal relationships was better able to capture dopamine responses during contingency degradation, as well as conditioned behavior following optogenetic manipulations of dopamine during noncontingent rewards. Our results suggest that mesostriatal dopamine encodes the contingencies between meaningful events during learning.


Subject(s)
Cues , Dopamine , Dopaminergic Neurons , Reward , Animals , Dopamine/metabolism , Rats , Male , Dopaminergic Neurons/metabolism , Dopaminergic Neurons/physiology , Conditioning, Classical , Ventral Striatum/metabolism , Ventral Striatum/physiology , Learning/physiology , Mesencephalon/metabolism , Mesencephalon/physiology , Reinforcement, Psychology
8.
PLoS One ; 19(5): e0301173, 2024.
Article in English | MEDLINE | ID: mdl-38771859

ABSTRACT

The following paper describes a steady-state model of concurrent choice, termed the active time model (ATM). ATM is derived from maximization principles and is characterized by a semi-Markov process. The model proposes that the controlling stimulus in concurrent variable-interval (VI) VI schedules of reinforcement is the time interval since the most recent response, termed here "the active interresponse time" or simply "active time." In the model after a response is generated, it is categorized by a function that relates active times to switch/stay probabilities. In the paper the output of ATM is compared with predictions made by three other models of operant conditioning: melioration, a version of scalar expectancy theory (SET), and momentary maximization. Data sets considered include preferences in multiple-concurrent VI VI schedules, molecular choice patterns, correlations between switching and perseveration, and molar choice proportions. It is shown that ATM can account for all of these data sets, while the other models produce more limited fits. However, rather than argue that ATM is the singular model for concurrent VI VI choice, a consideration of its concept space leads to the conclusion that operant choice is multiply-determined, and that an adaptive viewpoint-one that considers experimental procedures both as selecting mechanisms for animal choice as well as tests of the controlling variables of that choice-is warranted.


Subject(s)
Choice Behavior , Conditioning, Operant , Choice Behavior/physiology , Animals , Conditioning, Operant/physiology , Reinforcement Schedule , Time Factors , Models, Psychological , Reinforcement, Psychology , Markov Chains
9.
J Environ Manage ; 359: 120968, 2024 May.
Article in English | MEDLINE | ID: mdl-38703643

ABSTRACT

Planning under complex uncertainty often asks for plans that can adapt to changing future conditions. To inform plan development during this process, exploration methods have been used to explore the performance of candidate policies given uncertainties. Nevertheless, these methods hardly enable adaptation by themselves, so extra efforts are required to develop the final adaptive plans, hence compromising the overall decision-making efficiency. This paper introduces Reinforcement Learning (RL) that employs closed-loop control as a new exploration method that enables automated adaptive policy-making for planning under uncertainty. To investigate its performance, we compare RL with a widely-used exploration method, Multi-Objective Evolutionary Algorithm (MOEA), in two hypothetical problems via computational experiments. Our results indicate the complementarity of the two methods. RL makes better use of its exploration history, hence always providing higher efficiency and providing better policy robustness in the presence of parameter uncertainty. MOEA quantifies objective uncertainty in a more intuitive way, hence providing better robustness to objective uncertainty. These findings will help researchers choose appropriate methods in different applications.


Subject(s)
Algorithms , Decision Making , Uncertainty , Reinforcement, Psychology
10.
Eat Behav ; 53: 101878, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38696869

ABSTRACT

INTRODUCTION: Disordered eating behaviors are a current public health concern since their progression can lead to the development of a full criteria eating disorder. Sensitization to repeated intake of high energy density (HED) foods is associated with excess weight gain over time, but less is known about relationships with assessments of disordered eating. Thus, this study aims to understand how disordered eating behaviors refunlate to the influence of the food environment and sensitization. METHOD: 163 adolescents - 50 % female and 13.2 mean age - were followed for 24 months. Sensitization was assessed by comparing the relative reinforcing value (RRV) of HED food at baseline and after two weeks of daily intake; sensitization was defined as RRV of food after repeated intake. Study participants also completed the EDE-Q, and Power of Food Scale (PFS). We conducted multivariate general linear models to examine these associations. RESULTS: Sensitization status and PFS scores at baseline were positively associated with EDE-Q subscale scores cross-sectionally, but not longitudinally, at baseline and 24 months. We found that sensitization to HED food and higher susceptibility to food cues relates to increased disordered eating behaviors and both at baseline and at 24-months. DISCUSSION: These findings suggest that sensitization to repeated HED food intake and the food environment might be a risk factor for later engagement in disordered eating behaviors. Future studies should address the temporal relationships among these factors and the role that social norms around body weight and weight stigma may play in the development of these behaviors.


Subject(s)
Feeding Behavior , Feeding and Eating Disorders , Reinforcement, Psychology , Humans , Female , Adolescent , Feeding and Eating Disorders/psychology , Male , Feeding Behavior/psychology , Food , Cross-Sectional Studies , Longitudinal Studies , Energy Intake/physiology
11.
Nature ; 630(8015): 141-148, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38778097

ABSTRACT

Fentanyl is a powerful painkiller that elicits euphoria and positive reinforcement1. Fentanyl also leads to dependence, defined by the aversive withdrawal syndrome, which fuels negative reinforcement2,3 (that is, individuals retake the drug to avoid withdrawal). Positive and negative reinforcement maintain opioid consumption, which leads to addiction in one-fourth of users, the largest fraction for all addictive drugs4. Among the opioid receptors, µ-opioid receptors have a key role5, yet the induction loci of circuit adaptations that eventually lead to addiction remain unknown. Here we injected mice with fentanyl to acutely inhibit γ-aminobutyric acid-expressing neurons in the ventral tegmental area (VTA), causing disinhibition of dopamine neurons, which eventually increased dopamine in the nucleus accumbens. Knockdown of µ-opioid receptors in VTA abolished dopamine transients and positive reinforcement, but withdrawal remained unchanged. We identified neurons expressing µ-opioid receptors in the central amygdala (CeA) whose activity was enhanced during withdrawal. Knockdown of µ-opioid receptors in CeA eliminated aversive symptoms, suggesting that they mediate negative reinforcement. Thus, optogenetic stimulation caused place aversion, and mice readily learned to press a lever to pause optogenetic stimulation of CeA neurons that express µ-opioid receptors. Our study parses the neuronal populations that trigger positive and negative reinforcement in VTA and CeA, respectively. We lay out the circuit organization to develop interventions for reducing fentanyl addiction and facilitating rehabilitation.


Subject(s)
Dopaminergic Neurons , Fentanyl , Nucleus Accumbens , Receptors, Opioid, mu , Reinforcement, Psychology , Substance Withdrawal Syndrome , Ventral Tegmental Area , Animals , Fentanyl/pharmacology , Receptors, Opioid, mu/metabolism , Mice , Ventral Tegmental Area/drug effects , Ventral Tegmental Area/metabolism , Ventral Tegmental Area/physiology , Male , Dopaminergic Neurons/drug effects , Dopaminergic Neurons/metabolism , Substance Withdrawal Syndrome/metabolism , Nucleus Accumbens/metabolism , Nucleus Accumbens/drug effects , Dopamine/metabolism , Optogenetics , Central Amygdaloid Nucleus/metabolism , Central Amygdaloid Nucleus/drug effects , Central Amygdaloid Nucleus/physiology , Female , Mice, Inbred C57BL , Opioid-Related Disorders/metabolism , Analgesics, Opioid/pharmacology , Analgesics, Opioid/administration & dosage
12.
Neuropharmacology ; 255: 110008, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38797243

ABSTRACT

Ketamine (KET), a non-competitive N-methyl-d-aspartate (NMDA) receptor antagonist, has rapid onset of antidepressant effects in Treatment-Resistant Depression patients and repeated infusions are required to sustain its antidepressant properties. However, KET is an addictive drug, and so more preclinical and clinical research is needed to assess the safety of recurring treatments in both sexes. Thus, the aim of this study was to investigate the reinforcing properties of various doses of KET (0-, 0.125-, 0.25-, 0.5 mg/kg/infusion) and assess KET's cue-induced reinstatement and neuronal activation in both sexes of Long Evans rats. Neuronal activation was assessed using the protein expression of the immediate early gene cFos in the nucleus accumbens (Nac), an important brain area implicated in reward, reinforcement and reinstatement to most drug-related cues. Our findings show that KET has reinforcing effects in both male and female rats, albeit exclusively at the highest two doses (0.25 and 0.5 mg/kg/infusion). Furthermore, we noted sex differences, particularly at the highest dose of ketamine, with female rats displaying a higher rate of self-administration. Interestingly, all groups that self-administered KET reinstated to drug-cues. Following drug cue-induced reinstatement test in rats exposed to KET (0.25 mg/kg/infusion) or saline, there was higher cFos protein expression in KET-treated animals compared to saline controls, and higher cFos expression in the core compared to the shell subregions of the Nac. As for reinstatement, there were no notable sex differences reported for cFos expression in the Nac. These findings reveal some sex and dose dependent effects in KET's reinforcing properties and that KET at all doses induced similar reinstatement in both sexes. This study also demonstrated that cues associated with ketamine induce comparable neuronal activation in the Nac of both male and female rats. This work warrants further research into the potential addictive properties of KET, especially when administered at lower doses which are now being used in the clinic for treating various psychopathologies.


Subject(s)
Cues , Dose-Response Relationship, Drug , Ketamine , Nucleus Accumbens , Rats, Long-Evans , Reinforcement, Psychology , Animals , Ketamine/pharmacology , Ketamine/administration & dosage , Male , Nucleus Accumbens/drug effects , Nucleus Accumbens/metabolism , Female , Proto-Oncogene Proteins c-fos/metabolism , Excitatory Amino Acid Antagonists/pharmacology , Excitatory Amino Acid Antagonists/administration & dosage , Rats , Sex Characteristics , Self Administration , Conditioning, Operant/drug effects
13.
J Neural Eng ; 21(3)2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38718787

ABSTRACT

Objective. Vagus nerve stimulation (VNS) is being investigated as a potential therapy for cardiovascular diseases including heart failure, cardiac arrhythmia, and hypertension. The lack of a systematic approach for controlling and tuning the VNS parameters poses a significant challenge. Closed-loop VNS strategies combined with artificial intelligence (AI) approaches offer a framework for systematically learning and adapting the optimal stimulation parameters. In this study, we presented an interactive AI framework using reinforcement learning (RL) for automated data-driven design of closed-loop VNS control systems in a computational study.Approach.Multiple simulation environments with a standard application programming interface were developed to facilitate the design and evaluation of the automated data-driven closed-loop VNS control systems. These environments simulate the hemodynamic response to multi-location VNS using biophysics-based computational models of healthy and hypertensive rat cardiovascular systems in resting and exercise states. We designed and implemented the RL-based closed-loop VNS control frameworks in the context of controlling the heart rate and the mean arterial pressure for a set point tracking task. Our experimental design included two approaches; a general policy using deep RL algorithms and a sample-efficient adaptive policy using probabilistic inference for learning and control.Main results.Our simulation results demonstrated the capabilities of the closed-loop RL-based approaches to learn optimal VNS control policies and to adapt to variations in the target set points and the underlying dynamics of the cardiovascular system. Our findings highlighted the trade-off between sample-efficiency and generalizability, providing insights for proper algorithm selection. Finally, we demonstrated that transfer learning improves the sample efficiency of deep RL algorithms allowing the development of more efficient and personalized closed-loop VNS systems.Significance.We demonstrated the capability of RL-based closed-loop VNS systems. Our approach provided a systematic adaptable framework for learning control strategies without requiring prior knowledge about the underlying dynamics.


Subject(s)
Computer Simulation , Reinforcement, Psychology , Vagus Nerve Stimulation , Vagus Nerve Stimulation/methods , Animals , Rats , Heart Rate/physiology , Cardiovascular System , Algorithms , Artificial Intelligence
14.
Neuropharmacology ; 255: 110001, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38750804

ABSTRACT

Emerging evidence suggests an important role of astrocytes in mediating behavioral and molecular effects of commonly misused drugs. Passive exposure to nicotine alters molecular, morphological, and functional properties of astrocytes. However, a potential involvement of astrocytes in nicotine reinforcement remains largely unexplored. The overall hypothesis tested in the current study is that astrocytes play a critical role in nicotine reinforcement. Protein levels of the astrocyte marker glial fibrillary acidic protein (GFAP) were examined in key mesocorticolimbic regions following chronic nicotine intravenous self-administration. Fluorocitrate, a metabolic inhibitor of astrocytes, was tested for its effects on behaviors related to nicotine reinforcement and relapse. Effects of fluorocitrate on extracellular neurotransmitter levels, including glutamate, GABA, and dopamine, were determined with microdialysis. Chronic nicotine intravenous self-administration increased GFAP expression in the nucleus accumbens core (NACcr), but not other key mesocorticolimbic regions, compared to saline intravenous self-administration. Both intra-ventricular and intra-NACcr microinjection of fluorocitrate decreased nicotine self-administration. Intra-NACcr fluorocitrate microinjection also inhibited cue-induced reinstatement of nicotine seeking. Local perfusion of fluorocitrate decreased extracellular glutamate levels, elevated extracellular dopamine levels, but did not alter extracellular GABA levels in the NACcr. Fluorocitrate did not alter basal locomotor activity. These results indicate that nicotine reinforcement upregulates the astrocyte marker GFAP expression in the NACcr, metabolic inhibition of astrocytes attenuates nicotine reinforcement and relapse, and metabolic inhibition of astrocytes disrupts extracellular dopamine and glutamate transmission. Overall, these findings suggest that astrocytes play an important role in nicotine reinforcement and relapse, potentially through regulation of extracellular glutamate and dopamine neurotransmission.


Subject(s)
Astrocytes , Citrates , Dopamine , Glutamic Acid , Nicotine , Nucleus Accumbens , Rats, Wistar , Self Administration , Animals , Nucleus Accumbens/drug effects , Nucleus Accumbens/metabolism , Astrocytes/drug effects , Astrocytes/metabolism , Nicotine/pharmacology , Nicotine/administration & dosage , Male , Glutamic Acid/metabolism , Dopamine/metabolism , Citrates/pharmacology , Citrates/administration & dosage , Rats , Glial Fibrillary Acidic Protein/metabolism , Nicotinic Agonists/pharmacology , Nicotinic Agonists/administration & dosage , Microdialysis , Reinforcement, Psychology , gamma-Aminobutyric Acid/metabolism
15.
Neuropharmacology ; 255: 110002, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38754577

ABSTRACT

RATIONALE: Recent studies report that fentanyl analogs with relatively low pKa values produce antinociception in rodents without other mu opioid-typical side effects due to the restriction of their activity to injured tissue with relatively low pH values. However, it is unclear if and to what degree these compounds may produce mu opioid-typical side effects (respiratory depression, reinforcing effects) at doses higher than those required to produce antinociception. OBJECTIVES: The present study compared the inflammatory antinociceptive, respiratory-depressant, and reinforcing effects of fentanyl and two analogs of intermediate (FF3) and low (NFEPP) pKa values in terms of potency and efficacy in male and female Sprague-Dawley rats. METHODS: Nociception was produced by administration of Complete Freund's Adjuvant into the hind paw of subjects, and antinociception was measured using an electronic Von Frey test. Respiratory depression was measured using whole-body plethysmography. Reinforcing effects were measured in self-administration using a progressive-ratio schedule of reinforcement. The dose ranges tested for each drug encompassed no effect to maximal effects. RESULTS: All compounds produced full effects in all measures but varied in potency. FF3 and fentanyl were equipotent in antinociception and self-administration, but FF3 was less potent than fentanyl in respiratory depression. NFEPP was less potent than fentanyl in every measure. The magnitude of potency difference between antinociception and other effects was greater for FF3 than for NFEPP or fentanyl, indicating that FF3 had the widest margin of safety when relating antinociception to respiratory-depressant and reinforcing effects. CONCLUSIONS: Low pKa fentanyl analogs possess potential as safer analgesics, but determining the optimal degree of difference for pKa relative to fentanyl will require further study due to some differences between the current results and findings from prior work with these analogs.


Subject(s)
Analgesics, Opioid , Fentanyl , Rats, Sprague-Dawley , Animals , Fentanyl/pharmacology , Fentanyl/analogs & derivatives , Male , Female , Analgesics, Opioid/pharmacology , Rats , Reinforcement, Psychology , Dose-Response Relationship, Drug , Self Administration , Respiratory Insufficiency/chemically induced , Pain Measurement/drug effects , Pain Measurement/methods
16.
Behav Brain Sci ; 47: e118, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38770877

ABSTRACT

Curiosity and creativity are expressions of the trade-off between leveraging that with which we are familiar or seeking out novelty. Through the computational lens of reinforcement learning, we describe how formulating the value of information seeking and generation via their complementary effects on planning horizons formally captures a range of solutions to striking this balance.


Subject(s)
Creativity , Exploratory Behavior , Reinforcement, Psychology , Humans , Learning
17.
PLoS One ; 19(4): e0300842, 2024.
Article in English | MEDLINE | ID: mdl-38598429

ABSTRACT

Maze-solving is a classical mathematical task, and is recently analogously achieved using various eccentric media and devices, such as living tissues, chemotaxis, and memristors. Plasma generated in a labyrinth of narrow channels can also play a role as a route finder to the exit. In this study, we experimentally observe the function of maze-route findings in a plasma system based on a mixed discharge scheme of direct-current (DC) volume mode and alternative-current (AC) surface dielectric-barrier discharge, and computationally generalize this function in a reinforcement-learning model. In our plasma system, we install two electrodes at the entry and the exit in a square lattice configuration of narrow channels whose cross section is 1×1 mm2 with the total length around ten centimeters. Visible emissions in low-pressure Ar gas are observed after plasma ignition, and the plasma starting from a given entry location reaches the exit as the discharge voltage increases, whose route converging level is quantified by Shannon entropy. A similar short-path route is reproduced in a reinforcement-learning model in which electric potentials through the discharge voltage is replaced by rewards with positive and negative sign or polarity. The model is not rigorous numerical representation of plasma simulation, but it shares common points with the experiments along with a rough sketch of underlying processes (charges in experiments and rewards in modelling). This finding indicates that a plasma-channel network works in an analog computing function similar to a reinforcement-learning algorithm slightly modified in this study.


Subject(s)
Body Fluids , Reinforcement, Psychology , Reward , Plasma , Algorithms
18.
Behav Ther ; 55(3): 513-527, 2024 May.
Article in English | MEDLINE | ID: mdl-38670665

ABSTRACT

Tic disorders are a class of neurodevelopmental disorders characterized by involuntary motor and/or vocal tics. It has been hypothesized that tics function to reduce aversive premonitory urges (i.e., negative reinforcement) and that suppression-based behavioral interventions such as habit reversal training (HRT) and exposure and response prevention (ERP) disrupt this process and facilitate urge reduction through habituation. However, previous findings regarding the negative reinforcement hypothesis and the effect of suppression on the urge-tic relationship have been inconsistent. The present study applied a dynamical systems framework and within-subject time-series autoregressive models to examine the temporal dynamics of urges and tics and assess whether their relationship changes over time. Eleven adults with tic disorders provided continuous urge ratings during separate conditions in which they were instructed to tic freely or to suppress tics. During the free-to-tic conditions, there was considerable heterogeneity across participants in whether and how the urge-tic relationship followed a pattern consistent with the automatic negative reinforcement hypothesis. Further, little evidence for within-session habituation was seen; tic suppression did not result in a reduction in premonitory urges for most participants. Analysis of broader urge change metrics did show significant disruption to the urge pattern during suppression, which has implications for the current biobehavioral model of tics.


Subject(s)
Models, Psychological , Tic Disorders , Humans , Tic Disorders/psychology , Tic Disorders/therapy , Female , Adult , Male , Behavior Therapy/methods , Reinforcement, Psychology , Young Adult , Habits , Middle Aged
19.
Proc Natl Acad Sci U S A ; 121(15): e2317618121, 2024 Apr 09.
Article in English | MEDLINE | ID: mdl-38557193

ABSTRACT

Throughout evolution, bacteria and other microorganisms have learned efficient foraging strategies that exploit characteristic properties of their unknown environment. While much research has been devoted to the exploration of statistical models describing the dynamics of foraging bacteria and other (micro-) organisms, little is known, regarding the question of how good the learned strategies actually are. This knowledge gap is largely caused by the absence of methods allowing to systematically develop alternative foraging strategies to compare with. In the present work, we use deep reinforcement learning to show that a smart run-and-tumble agent, which strives to find nutrients for its survival, learns motion patterns that are remarkably similar to the trajectories of chemotactic bacteria. Strikingly, despite this similarity, we also find interesting differences between the learned tumble rate distribution and the one that is commonly assumed for the run and tumble model. We find that these differences equip the agent with significant advantages regarding its foraging and survival capabilities. Our results uncover a generic route to use deep reinforcement learning for discovering search and collection strategies that exploit characteristic but initially unknown features of the environment. These results can be used, e.g., to program future microswimmers, nanorobots, and smart active particles for tasks like searching for cancer cells, micro-waste collection, or environmental remediation.


Subject(s)
Learning , Reinforcement, Psychology , Models, Statistical , Motion , Bacteria
20.
Sci Robot ; 9(89): eadi9579, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38630806

ABSTRACT

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.


Subject(s)
Robotics , Humans , Aged , Robotics/methods , Locomotion , Walking , Learning , Reinforcement, Psychology
SELECTION OF CITATIONS
SEARCH DETAIL
...