Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
Neuron ; 2024 May 08.
Article in English | MEDLINE | ID: mdl-38733985

ABSTRACT

A key feature of cortical systems is functional organization: the arrangement of functionally distinct neurons in characteristic spatial patterns. However, the principles underlying the emergence of functional organization in the cortex are poorly understood. Here, we develop the topographic deep artificial neural network (TDANN), the first model to predict several aspects of the functional organization of multiple cortical areas in the primate visual system. We analyze the factors driving the TDANN's success and find that it balances two objectives: learning a task-general sensory representation and maximizing the spatial smoothness of responses according to a metric that scales with cortical surface area. In turn, the representations learned by the TDANN are more brain-like than in spatially unconstrained models. Finally, we provide evidence that the TDANN's functional organization balances performance with between-area connection length. Our results offer a unified principle for understanding the functional organization of the primate ventral visual system.

3.
Neural Comput ; 36(1): 151-174, 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-38052080

ABSTRACT

In this work, we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance traveled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction among the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase-space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents that cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD. Understanding the limiting dynamics of SGD, and its dependence on various important hyperparameters like batch size, learning rate, and momentum, can serve as a basis for future work that can turn these insights into algorithmic gains.

4.
Behav Brain Sci ; 46: e390, 2023 Dec 06.
Article in English | MEDLINE | ID: mdl-38054303

ABSTRACT

In the target article, Bowers et al. dispute deep artificial neural network (ANN) models as the currently leading models of human vision without producing alternatives. They eschew the use of public benchmarking platforms to compare vision models with the brain and behavior, and they advocate for a fragmented, phenomenon-specific modeling approach. These are unconstructive to scientific progress. We outline how the Brain-Score community is moving forward to add new model-to-human comparisons to its community-transparent suite of benchmarks.


Subject(s)
Brain , Neural Networks, Computer , Humans
5.
PLoS Comput Biol ; 19(10): e1011506, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37782673

ABSTRACT

Studies of the mouse visual system have revealed a variety of visual brain areas that are thought to support a multitude of behavioral capacities, ranging from stimulus-reward associations, to goal-directed navigation, and object-centric discriminations. However, an overall understanding of the mouse's visual cortex, and how it supports a range of behaviors, remains unknown. Here, we take a computational approach to help address these questions, providing a high-fidelity quantitative model of mouse visual cortex and identifying key structural and functional principles underlying that model's success. Structurally, we find that a comparatively shallow network structure with a low-resolution input is optimal for modeling mouse visual cortex. Our main finding is functional-that models trained with task-agnostic, self-supervised objective functions based on the concept of contrastive embeddings are much better matches to mouse cortex, than models trained on supervised objectives or alternative self-supervised methods. This result is very much unlike in primates where prior work showed that the two were roughly equivalent, naturally leading us to ask the question of why these self-supervised objectives are better matches than supervised ones in mouse. To this end, we show that the self-supervised, contrastive objective builds a general-purpose visual representation that enables the system to achieve better transfer on out-of-distribution visual scene understanding and reward-based navigation tasks. Our results suggest that mouse visual cortex is a low-resolution, shallow network that makes best use of the mouse's limited resources to create a light-weight, general-purpose visual system-in contrast to the deep, high-resolution, and more categorization-dominated visual system of primates.


Subject(s)
Learning , Visual Cortex , Animals , Mice , Brain , Brain Mapping , Primates
6.
Behav Res Methods ; 2023 Sep 01.
Article in English | MEDLINE | ID: mdl-37656342

ABSTRACT

Head-mounted cameras have been used in developmental psychology research for more than a decade to provide a rich and comprehensive view of what infants see during their everyday experiences. However, variation between these devices has limited the field's ability to compare results across studies and across labs. Further, the video data captured by these cameras to date has been relatively low-resolution, limiting how well machine learning algorithms can operate over these rich video data. Here, we provide a well-tested and easily constructed design for a head-mounted camera assembly-the BabyView-developed in collaboration with Daylight Design, LLC., a professional product design firm. The BabyView collects high-resolution video, accelerometer, and gyroscope data from children approximately 6-30 months of age via a GoPro camera custom mounted on a soft child-safety helmet. The BabyView also captures a large, portrait-oriented vertical field-of-view that encompasses both children's interactions with objects and with their social partners. We detail our protocols for video data management and for handling sensitive data from home environments. We also provide customizable materials for onboarding families with the BabyView. We hope that these materials will encourage the wide adoption of the BabyView, allowing the field to collect high-resolution data that can link children's everyday environments with their learning outcomes.

7.
bioRxiv ; 2023 May 18.
Article in English | MEDLINE | ID: mdl-37292946

ABSTRACT

A key feature of many cortical systems is functional organization: the arrangement of neurons with specific functional properties in characteristic spatial patterns across the cortical surface. However, the principles underlying the emergence and utility of functional organization are poorly understood. Here we develop the Topographic Deep Artificial Neural Network (TDANN), the first unified model to accurately predict the functional organization of multiple cortical areas in the primate visual system. We analyze the key factors responsible for the TDANN's success and find that it strikes a balance between two specific objectives: achieving a task-general sensory representation that is self-supervised, and maximizing the smoothness of responses across the cortical sheet according to a metric that scales relative to cortical surface area. In turn, the representations learned by the TDANN are lower dimensional and more brain-like than those in models that lack a spatial smoothness constraint. Finally, we provide evidence that the TDANN's functional organization balances performance with inter-area connection length, and use the resulting models for a proof-of-principle optimization of cortical prosthetic design. Our results thus offer a unified principle for understanding functional organization and a novel view of the functional role of the visual system in particular.

8.
Neural Comput ; 34(8): 1652-1675, 2022 07 14.
Article in English | MEDLINE | ID: mdl-35798321

ABSTRACT

The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway. Properly chosen intermediate-depth ConvRNN circuit architectures, which incorporate mechanisms of feedforward bypassing and recurrent gating, can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then develop methods that allow us to compare both CNNs and ConvRNNs to finely grained measurements of primate categorization behavior and neural response trajectories across thousands of stimuli. We find that high-performing ConvRNNs provide a better match to these data than feedforward networks of any depth, predicting the precise timings at which each stimulus is behaviorally decoded from neural activation patterns. Moreover, these ConvRNN circuits consistently produce quantitatively accurate predictions of neural dynamics from V4 and IT across the entire stimulus presentation. In fact, we find that the highest-performing ConvRNNs, which best match neural and behavioral data, also achieve a strong Pareto trade-off between task performance and overall network size. Taken together, our results suggest the functional purpose of recurrence in the ventral pathway is to fit a high-performing network in cortex, attaining computational power through temporal rather than spatial complexity.


Subject(s)
Task Performance and Analysis , Visual Perception , Animals , Humans , Macaca mulatta/physiology , Neural Networks, Computer , Pattern Recognition, Visual/physiology , Recognition, Psychology/physiology , Visual Pathways/physiology , Visual Perception/physiology
9.
Adv Neural Inf Process Syst ; 35: 22628-22642, 2022.
Article in English | MEDLINE | ID: mdl-38435074

ABSTRACT

Humans learn from visual inputs at multiple timescales, both rapidly and flexibly acquiring visual knowledge over short periods, and robustly accumulating online learning progress over longer periods. Modeling these powerful learning capabilities is an important problem for computational visual cognitive science, and models that could replicate them would be of substantial utility in real-world computer vision settings. In this work, we establish benchmarks for both real-time and life-long continual visual learning. Our real-time learning benchmark measures a model's ability to match the rapid visual behavior changes of real humans over the course of minutes and hours, given a stream of visual inputs. Our life-long learning benchmark evaluates the performance of models in a purely online learning curriculum obtained directly from child visual experience over the course of years of development. We evaluate a spectrum of recent deep self-supervised visual learning algorithms on both benchmarks, finding that none of them perfectly match human performance, though some algorithms perform substantially better than others. Interestingly, algorithms embodying recent trends in self-supervised learning - including BYOL, SwAV and MAE - are substantially worse on our benchmarks than an earlier generation of self-supervised algorithms such as SimCLR and MoCo-v2. We present analysis indicating that the failure of these newer algorithms is primarily due to their inability to handle the kind of sparse low-diversity datastreams that naturally arise in the real world, and that actively leveraging memory through negative sampling - a mechanism eschewed by these newer algorithms - appears useful for facilitating learning in such low-diversity environments. We also illustrate a complementarity between the short and long timescales in the two benchmarks, showing how requiring a single learning algorithm to be locally context-sensitive enough to match real-time learning changes while stable enough to avoid catastrophic forgetting over the long term induces a trade-off that human-like algorithms may have to straddle. Taken together, our benchmarks establish a quantitative way to directly compare learning between neural networks models and human learners, show how choices in the mechanism by which such algorithms handle sample comparison and memory strongly impact their ability to match human learning abilities, and expose an open problem space for identifying more flexible and robust visual self-supervision algorithms.

10.
Neuron ; 109(17): 2755-2766.e6, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34265252

ABSTRACT

The medial temporal lobe (MTL) supports a constellation of memory-related behaviors. Its involvement in perceptual processing, however, has been subject to enduring debate. This debate centers on perirhinal cortex (PRC), an MTL structure at the apex of the ventral visual stream (VVS). Here we leverage a deep learning framework that approximates visual behaviors supported by the VVS (i.e., lacking PRC). We first apply this approach retroactively, modeling 30 published visual discrimination experiments: excluding non-diagnostic stimulus sets, there is a striking correspondence between VVS-modeled and PRC-lesioned behavior, while each is outperformed by PRC-intact participants. We corroborate and extend these results with a novel experiment, directly comparing PRC-intact human performance to electrophysiological recordings from the macaque VVS: PRC-intact participants outperform a linear readout of high-level visual cortex. By situating lesion, electrophysiological, and behavioral results within a shared computational framework, this work resolves decades of seemingly inconsistent findings surrounding PRC involvement in perception.


Subject(s)
Models, Neurological , Perirhinal Cortex/physiology , Visual Perception , Animals , Deep Learning , Humans , Macaca
11.
Proc Natl Acad Sci U S A ; 118(3)2021 01 19.
Article in English | MEDLINE | ID: mdl-33431673

ABSTRACT

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today's best supervised methods and that the mapping of these neural network models' hidden layers is neuroanatomically consistent across the ventral stream. Strikingly, we find that these methods produce brain-like representations even when trained solely with real human child developmental data collected from head-mounted cameras, despite the fact that these datasets are noisy and limited. We also find that semisupervised deep contrastive embeddings can leverage small numbers of labeled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results illustrate a use of unsupervised learning to provide a quantitative model of a multiarea cortical brain system and present a strong candidate for a biologically plausible computational theory of primate sensory learning.


Subject(s)
Nerve Net/physiology , Neural Networks, Computer , Neurons/physiology , Pattern Recognition, Visual/physiology , Visual Cortex/physiology , Animals , Child , Datasets as Topic , Humans , Macaca/physiology , Nerve Net/anatomy & histology , Unsupervised Machine Learning , Visual Cortex/anatomy & histology
12.
Vision Res ; 172: 27-45, 2020 07.
Article in English | MEDLINE | ID: mdl-32388211

ABSTRACT

The ventral visual stream is known to be organized hierarchically, where early visual areas processing simplistic features feed into higher visual areas processing more complex features. Hierarchical convolutional neural networks (CNNs) were largely inspired by this type of brain organization and have been successfully used to model neural responses in different areas of the visual system. In this work, we aim to understand how an instance of these models corresponds to temporal dynamics of human object processing. Using representational similarity analysis (RSA) and various similarity metrics, we compare the model representations with two electroencephalography (EEG) data sets containing responses to a shared set of 72 images. We find that there is a hierarchical relationship between the depth of a layer and the time at which peak correlation with the brain response occurs for certain similarity metrics in both data sets. However, when comparing across layers in the neural network, the correlation onset time did not appear in a strictly hierarchical fashion. We present two additional methods that improve upon the achieved correlations by optimally weighting features from the CNN and show that depending on the similarity metric, deeper layers of the CNN provide a better correspondence than shallow layers to later time points in the EEG responses. However, we do not find that shallow layers provide better correspondences than those of deeper layers to early time points, an observation that violates the hierarchy and is in agreement with the finding from the onset-time analysis. This work makes a first comparison of various response features-including multiple similarity metrics and data sets-with respect to a neural network.


Subject(s)
Electroencephalography , Neural Networks, Computer , Visual Cortex/physiology , Visual Perception/physiology , Humans , Signal Processing, Computer-Assisted , Time Factors
13.
J Neurosci ; 40(8): 1710-1721, 2020 02 19.
Article in English | MEDLINE | ID: mdl-31871278

ABSTRACT

Drawing is a powerful tool that can be used to convey rich perceptual information about objects in the world. What are the neural mechanisms that enable us to produce a recognizable drawing of an object, and how does this visual production experience influence how this object is represented in the brain? Here we evaluate the hypothesis that producing and recognizing an object recruit a shared neural representation, such that repeatedly drawing the object can enhance its perceptual discriminability in the brain. We scanned human participants (N = 31; 11 male) using fMRI across three phases of a training study: during training, participants repeatedly drew two objects in an alternating sequence on an MR-compatible tablet; before and after training, they viewed these and two other control objects, allowing us to measure the neural representation of each object in visual cortex. We found that: (1) stimulus-evoked representations of objects in visual cortex are recruited during visually cued production of drawings of these objects, even throughout the period when the object cue is no longer present; (2) the object currently being drawn is prioritized in visual cortex during drawing production, while other repeatedly drawn objects are suppressed; and (3) patterns of connectivity between regions in occipital and parietal cortex supported enhanced decoding of the currently drawn object across the training phase, suggesting a potential neural substrate for learning how to transform perceptual representations into representational actions. Together, our study provides novel insight into the functional relationship between visual production and recognition in the brain.SIGNIFICANCE STATEMENT Humans can produce simple line drawings that capture rich information about their perceptual experiences. However, the mechanisms that support this behavior are not well understood. Here we investigate how regions in visual cortex participate in the recognition of an object and the production of a drawing of it. We find that these regions carry diagnostic information about an object in a similar format both during recognition and production, and that practice drawing an object enhances transmission of information about it to downstream regions. Together, our study provides novel insight into the functional relationship between visual production and recognition in the brain.


Subject(s)
Pattern Recognition, Visual/physiology , Recognition, Psychology/physiology , Visual Cortex/diagnostic imaging , Adult , Female , Humans , Magnetic Resonance Imaging , Male , Photic Stimulation , Visual Cortex/physiology , Young Adult
14.
Nat Neurosci ; 22(11): 1761-1770, 2019 11.
Article in English | MEDLINE | ID: mdl-31659335

ABSTRACT

Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In artificial neural networks, the three components specified by design are the objective functions, the learning rules and the architectures. With the growing success of deep learning, which utilizes brain-inspired architectures, these three designed components have increasingly become central to how we model, engineer and optimize complex artificial learning systems. Here we argue that a greater focus on these components would also benefit systems neuroscience. We give examples of how this optimization-based framework can drive theoretical and experimental progress in neuroscience. We contend that this principled perspective on systems neuroscience will help to generate more rapid progress.


Subject(s)
Artificial Intelligence , Deep Learning , Neural Networks, Computer , Animals , Brain/physiology , Humans
15.
Cogn Sci ; 42(8): 2670-2698, 2018 11.
Article in English | MEDLINE | ID: mdl-30125986

ABSTRACT

Production and comprehension have long been viewed as inseparable components of language. The study of vision, by contrast, has centered almost exclusively on comprehension. Here we investigate drawing-the most basic form of visual production. How do we convey concepts in visual form, and how does refining this skill, in turn, affect recognition? We developed an online platform for collecting large amounts of drawing and recognition data, and applied a deep convolutional neural network model of visual cortex trained only on natural images to explore the hypothesis that drawing recruits the same abstract feature representations that support natural visual object recognition. Consistent with this hypothesis, higher layers of this model captured the abstract features of both drawings and natural images most important for recognition, and people learning to produce more recognizable drawings of objects exhibited enhanced recognition of those objects. These findings could explain why drawing is so effective for communicating visual concepts, they suggest novel approaches for evaluating and refining conceptual knowledge, and they highlight the potential of deep networks for understanding human learning.


Subject(s)
Learning/physiology , Pattern Recognition, Visual/physiology , Recognition, Psychology/physiology , Visual Perception/physiology , Humans , Models, Neurological , Photic Stimulation
16.
Neuron ; 98(3): 630-644.e16, 2018 05 02.
Article in English | MEDLINE | ID: mdl-29681533

ABSTRACT

A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.


Subject(s)
Acoustic Stimulation/methods , Auditory Cortex/diagnostic imaging , Auditory Cortex/physiology , Magnetic Resonance Imaging/methods , Nerve Net/diagnostic imaging , Nerve Net/physiology , Psychomotor Performance/physiology , Adolescent , Adult , Aged , Female , Forecasting , Humans , Male , Middle Aged , Young Adult
17.
Front Comput Neurosci ; 11: 100, 2017.
Article in English | MEDLINE | ID: mdl-29163117

ABSTRACT

Visual information in the visual cortex is processed in a hierarchical manner. Recent studies show that higher visual areas, such as V2, V3, and V4, respond more vigorously to images with naturalistic higher-order statistics than to images lacking them. This property is a functional signature of higher areas, as it is much weaker or even absent in the primary visual cortex (V1). However, the mechanism underlying this signature remains elusive. We studied this problem using computational models. In several typical hierarchical visual models including the AlexNet, VggNet, and SHMAX, this signature was found to be prominent in higher layers but much weaker in lower layers. By changing both the model structure and experimental settings, we found that the signature strongly correlated with sparse firing of units in higher layers but not with any other factors, including model structure, training algorithm (supervised or unsupervised), receptive field size, and property of training stimuli. The results suggest an important role of sparse neuronal activity underlying this special feature of higher visual areas.

18.
J Vis ; 16(7): 7, 2016 05 01.
Article in English | MEDLINE | ID: mdl-27153196

ABSTRACT

Humans can learn to recognize new objects just from observing example views. However, it is unknown what structural information enables this learning. To address this question, we manipulated the amount of structural information given to subjects during unsupervised learning by varying the format of the trained views. We then tested how format affected participants' ability to discriminate similar objects across views that were rotated 90° apart. We found that, after training, participants' performance increased and generalized to new views in the same format. Surprisingly, the improvement was similar across line drawings, shape from shading, and shape from shading + stereo even though the latter two formats provide richer depth information compared to line drawings. In contrast, participants' improvement was significantly lower when training used silhouettes, suggesting that silhouettes do not have enough information to generate a robust 3-D structure. To test whether the learned object representations were format-specific or format-invariant, we examined if learning novel objects from example views transfers across formats. We found that learning objects from example line drawings transferred to shape from shading and vice versa. These results have important implications for theories of object recognition because they suggest that (a) learning the 3-D structure of objects does not require rich structural cues during training as long as shape information of internal and external features is provided and (b) learning generates shape-based object representations independent of the training format.


Subject(s)
Cues , Depth Perception/physiology , Form Perception/physiology , Imaging, Three-Dimensional , Learning , Pattern Recognition, Visual/physiology , Adult , Female , Humans , Male
19.
Nat Neurosci ; 19(3): 356-65, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26906502

ABSTRACT

Fueled by innovation in the computer vision and artificial intelligence communities, recent developments in computational neuroscience have used goal-driven hierarchical convolutional neural networks (HCNNs) to make strides in modeling neural single-unit and population responses in higher visual cortical areas. In this Perspective, we review the recent progress in a broader modeling context and describe some of the key technical innovations that have supported it. We then outline how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.


Subject(s)
Goals , Learning/physiology , Models, Neurological , Neural Networks, Computer , Somatosensory Cortex/physiology , Visual Cortex/physiology , Animals , Humans
20.
Curr Opin Neurobiol ; 37: 114-120, 2016 04.
Article in English | MEDLINE | ID: mdl-26921828

ABSTRACT

Propelled by advances in biologically inspired computer vision and artificial intelligence, the past five years have seen significant progress in using deep neural networks to model response patterns of neurons in visual cortex. In this paper, we briefly review this progress and then discuss eight key 'open questions' that we believe will drive research in computational models of sensory systems over the next five years, both in visual cortex and beyond.


Subject(s)
Models, Neurological , Sensorimotor Cortex/physiology , Artificial Intelligence/trends , Neural Networks, Computer , Research/trends
SELECTION OF CITATIONS
SEARCH DETAIL
...