Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Curr Opin Neurobiol ; 82: 102758, 2023 10.
Article in English | MEDLINE | ID: mdl-37619425

ABSTRACT

Notions of surprise and novelty have been used in various experimental and theoretical studies across multiple brain areas and species. However, 'surprise' and 'novelty' refer to different quantities in different studies, which raises concerns about whether these studies indeed relate to the same functionalities and mechanisms in the brain. Here, we address these concerns through a systematic investigation of how different aspects of surprise and novelty relate to different brain functions and physiological signals. We review recent classifications of definitions proposed for surprise and novelty along with links to experimental observations. We show that computational modeling and quantifiable definitions enable novel interpretations of previous findings and form a foundation for future theoretical and experimental studies.


Subject(s)
Brain , Computer Simulation
2.
Nat Commun ; 14(1): 2979, 2023 05 23.
Article in English | MEDLINE | ID: mdl-37221167

ABSTRACT

Birds of the crow family adapt food-caching strategies to anticipated needs at the time of cache recovery and rely on memory of the what, where and when of previous caching events to recover their hidden food. It is unclear if this behavior can be explained by simple associative learning or if it relies on higher cognitive processes like mental time-travel. We present a computational model and propose a neural implementation of food-caching behavior. The model has hunger variables for motivational control, reward-modulated update of retrieval and caching policies and an associative neural network for remembering caching events with a memory consolidation mechanism for flexible decoding of the age of a memory. Our methodology of formalizing experimental protocols is transferable to other domains and facilitates model evaluation and experiment design. Here, we show that memory-augmented, associative reinforcement learning without mental time-travel is sufficient to explain the results of 28 behavioral experiments with food-caching birds.


Subject(s)
Birds , Crows , Animals , Conditioning, Classical , Food , Computer Simulation
3.
Neuroimage ; 246: 118780, 2022 02 01.
Article in English | MEDLINE | ID: mdl-34875383

ABSTRACT

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.


Subject(s)
Brain/physiology , Decision Making/physiology , Functional Neuroimaging/methods , Learning/physiology , Magnetic Resonance Imaging/methods , Adult , Brain/diagnostic imaging , Female , Humans , Male , Models, Biological , Reinforcement, Psychology , Young Adult
4.
Neural Comput ; 33(2): 269-340, 2021 02.
Article in English | MEDLINE | ID: mdl-33400898

ABSTRACT

Surprise-based learning allows agents to rapidly adapt to nonstationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call the Bayes Factor Surprise, that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms, the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprise-based algorithms, one in the family of particle filters, one in the family of variational learning, and one in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from the Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from the Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior and to reinforcement learning in nonstationary environments.


Subject(s)
Algorithms , Behavior/physiology , Computer Simulation , Learning/physiology , Reinforcement, Psychology , Animals , Bayes Theorem , Humans
5.
Sci Rep ; 11(1): 835, 2021 01 12.
Article in English | MEDLINE | ID: mdl-33436969

ABSTRACT

Previous research reported that corvids preferentially cache food in a location where no food will be available or cache more of a specific food in a location where this food will not be available. Here, we consider possible explanations for these prospective caching behaviours and directly compare two competing hypotheses. The Compensatory Caching Hypothesis suggests that birds learn to cache more of a particular food in places where that food was less frequently available in the past. In contrast, the Future Planning Hypothesis suggests that birds recall the 'what-when-where' features of specific past events to predict the future availability of food. We designed a protocol in which the two hypotheses predict different caching patterns across different caching locations such that the two explanations can be disambiguated. We formalised the hypotheses in a Bayesian model comparison and tested this protocol in two experiments with one of the previously tested species, namely Eurasian jays. Consistently across the two experiments, the observed caching pattern did not support either hypothesis; rather it was best explained by a uniform distribution of caches over the different caching locations. Future research is needed to gain more insight into the cognitive mechanism underpinning corvids' caching for the future.

6.
PLoS Comput Biol ; 16(4): e1007640, 2020 04.
Article in English | MEDLINE | ID: mdl-32271761

ABSTRACT

This is a PLOS Computational Biology Education paper. The idea that the brain functions so as to minimize certain costs pervades theoretical neuroscience. Because a cost function by itself does not predict how the brain finds its minima, additional assumptions about the optimization method need to be made to predict the dynamics of physiological quantities. In this context, steepest descent (also called gradient descent) is often suggested as an algorithmic principle of optimization potentially implemented by the brain. In practice, researchers often consider the vector of partial derivatives as the gradient. However, the definition of the gradient and the notion of a steepest direction depend on the choice of a metric. Because the choice of the metric involves a large number of degrees of freedom, the predictive power of models that are based on gradient descent must be called into question, unless there are strong constraints on the choice of the metric. Here, we provide a didactic review of the mathematics of gradient descent, illustrate common pitfalls of using gradient descent as a principle of brain function with examples from the literature, and propose ways forward to constrain the metric.


Subject(s)
Biophysics/methods , Brain/diagnostic imaging , Brain/physiology , Computational Biology/methods , Algorithms , Computer Simulation , Humans , Image Processing, Computer-Assisted , Kinetics , Mathematics , Neural Networks, Computer , Neurosciences/methods
7.
Neural Netw ; 118: 90-101, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31254771

ABSTRACT

Training deep neural networks with the error backpropagation algorithm is considered implausible from a biological perspective. Numerous recent publications suggest elaborate models for biologically plausible variants of deep learning, typically defining success as reaching around 98% test accuracy on the MNIST data set. Here, we investigate how far we can go on digit (MNIST) and object (CIFAR10) classification with biologically plausible, local learning rules in a network with one hidden layer and a single readout layer. The hidden layer weights are either fixed (random or random Gabor filters) or trained with unsupervised methods (Principal/Independent Component Analysis or Sparse Coding) that can be implemented by local learning rules. The readout layer is trained with a supervised, local learning rule. We first implement these models with rate neurons. This comparison reveals, first, that unsupervised learning does not lead to better performance than fixed random projections or Gabor filters for large hidden layers. Second, networks with localized receptive fields perform significantly better than networks with all-to-all connectivity and can reach backpropagation performance on MNIST. We then implement two of the networks - fixed, localized, random & random Gabor filters in the hidden layer - with spiking leaky integrate-and-fire neurons and spike timing dependent plasticity to train the readout layer. These spiking models achieve >98.2% test accuracy on MNIST, which is close to the performance of rate networks with one hidden layer trained with backpropagation. The performance of our shallow network models is comparable to most current biologically plausible models of deep learning. Furthermore, our results with a shallow spiking network provide an important reference and suggest the use of data sets other than MNIST for testing the performance of future models of biologically plausible deep learning.


Subject(s)
Deep Learning , Neural Networks, Computer , Algorithms , Deep Learning/trends , Neurons/physiology
8.
Front Neural Circuits ; 12: 53, 2018.
Article in English | MEDLINE | ID: mdl-30108488

ABSTRACT

Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules.


Subject(s)
Brain/physiology , Electrophysiological Phenomena/physiology , Learning/physiology , Models, Biological , Neuronal Plasticity/physiology , Neurons/physiology , Humans
9.
Neural Comput ; 29(2): 458-484, 2017 02.
Article in English | MEDLINE | ID: mdl-27870611

ABSTRACT

We show that Hopfield neural networks with synchronous dynamics and asymmetric weights admit stable orbits that form sequences of maximal length. For [Formula: see text] units, these sequences have length [Formula: see text]; that is, they cover the full state space. We present a mathematical proof that maximal-length orbits exist for all [Formula: see text], and we provide a method to construct both the sequence and the weight matrix that allow its production. The orbit is relatively robust to dynamical noise, and perturbations of the optimal weights reveal other periodic orbits that are not maximal but typically still very long. We discuss how the resulting dynamics on slow time-scales can be used to generate desired output sequences.


Subject(s)
Neural Networks, Computer , Animals , Brain/physiology , Humans , Learning/physiology , Models, Neurological , Neural Pathways/physiology , Neurons/physiology
10.
PLoS Comput Biol ; 12(6): e1005003, 2016 06.
Article in English | MEDLINE | ID: mdl-27341100

ABSTRACT

Animals learn to make predictions, such as associating the sound of a bell with upcoming feeding or predicting a movement that a motor command is eliciting. How predictions are realized on the neuronal level and what plasticity rule underlies their learning is not well understood. Here we propose a biologically plausible synaptic plasticity rule to learn predictions on a single neuron level on a timescale of seconds. The learning rule allows a spiking two-compartment neuron to match its current firing rate to its own expected future discounted firing rate. For instance, if an originally neutral event is repeatedly followed by an event that elevates the firing rate of a neuron, the originally neutral event will eventually also elevate the neuron's firing rate. The plasticity rule is a form of spike timing dependent plasticity in which a presynaptic spike followed by a postsynaptic spike leads to potentiation. Even if the plasticity window has a width of 20 milliseconds, associations on the time scale of seconds can be learned. We illustrate prospective coding with three examples: learning to predict a time varying input, learning to predict the next stimulus in a delayed paired-associate task and learning with a recurrent network to reproduce a temporally compressed version of a sequence. We discuss the potential role of the learning mechanism in classical trace conditioning. In the special case that the signal to be predicted encodes reward, the neuron learns to predict the discounted future reward and learning is closely related to the temporal difference learning algorithm TD(λ).


Subject(s)
Action Potentials/physiology , Computational Biology/methods , Models, Neurological , Neurons/physiology , Animals , Dendrites/physiology , Macaca , Neuronal Plasticity/physiology
11.
Neuron ; 85(4): 664-6, 2015 Feb 18.
Article in English | MEDLINE | ID: mdl-25695266

ABSTRACT

In this issue of Neuron, Daie et al. (2015) show that the eye velocity-to-position neural integrator not only encodes the position, but also how it was reached. Representing content and context in the same neuronal population may form a general coding principle.


Subject(s)
Action Potentials/physiology , Memory, Short-Term/physiology , Neurons/physiology , Animals
12.
PLoS Comput Biol ; 10(6): e1003640, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24901935

ABSTRACT

Recent experiments revealed that the fruit fly Drosophila melanogaster has a dedicated mechanism for forgetting: blocking the G-protein Rac leads to slower and activating Rac to faster forgetting. This active form of forgetting lacks a satisfactory functional explanation. We investigated optimal decision making for an agent adapting to a stochastic environment where a stimulus may switch between being indicative of reward or punishment. Like Drosophila, an optimal agent shows forgetting with a rate that is linked to the time scale of changes in the environment. Moreover, to reduce the odds of missing future reward, an optimal agent may trade the risk of immediate pain for information gain and thus forget faster after aversive conditioning. A simple neuronal network reproduces these features. Our theory shows that forgetting in Drosophila appears as an optimal adaptive behavior in a changing environment. This is in line with the view that forgetting is adaptive rather than a consequence of limitations of the memory system.


Subject(s)
Drosophila melanogaster/physiology , Memory/physiology , Adaptation, Physiological , Adaptation, Psychological , Animals , Behavior, Animal/physiology , Computational Biology , Conditioning, Psychological , Decision Making/physiology , Environment , Learning/physiology , Models, Biological , Models, Psychological , Odorants , Reward , Stochastic Processes
13.
J Neurosci ; 33(23): 9565-75, 2013 Jun 05.
Article in English | MEDLINE | ID: mdl-23739954

ABSTRACT

Storing and recalling spiking sequences is a general problem the brain needs to solve. It is, however, unclear what type of biologically plausible learning rule is suited to learn a wide class of spatiotemporal activity patterns in a robust way. Here we consider a recurrent network of stochastic spiking neurons composed of both visible and hidden neurons. We derive a generic learning rule that is matched to the neural dynamics by minimizing an upper bound on the Kullback-Leibler divergence from the target distribution to the model distribution. The derived learning rule is consistent with spike-timing dependent plasticity in that a presynaptic spike preceding a postsynaptic spike elicits potentiation while otherwise depression emerges. Furthermore, the learning rule for synapses that target visible neurons can be matched to the recently proposed voltage-triplet rule. The learning rule for synapses that target hidden neurons is modulated by a global factor, which shares properties with astrocytes and gives rise to testable predictions.


Subject(s)
Action Potentials/physiology , Mental Recall/physiology , Neural Networks, Computer , Learning/physiology , Models, Neurological , Synapses/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...