Search | VHL Regional Portal

An ecologically motivated image dataset for deep learning yields better models of human vision.

Mehrer, Johannes; Spoerer, Courtney J; Jones, Emer C; Kriegeskorte, Nikolaus; Kietzmann, Tim C.

Proc Natl Acad Sci U S A ; 118(8)2021 02 23.

Article in English | MEDLINE | ID: mdl-33593900

ABSTRACT

Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.

Subject(s)

Deep Learning , Ecology , Models, Neurological , Neural Networks, Computer , Pattern Recognition, Visual , Visual Cortex/physiology , Visual Perception/physiology , Brain Mapping , Humans

Individual differences among deep neural network models.

Mehrer, Johannes; Spoerer, Courtney J; Kriegeskorte, Nikolaus; Kietzmann, Tim C.

Nat Commun ; 11(1): 5725, 2020 11 12.

Article in English | MEDLINE | ID: mdl-33184286

ABSTRACT

Deep neural networks (DNNs) excel at visual recognition tasks and are increasingly used as a modeling framework for neural computations in the primate brain. Just like individual brains, each DNN has a unique connectivity and representational profile. Here, we investigate individual differences among DNN instances that arise from varying only the random initialization of the network weights. Using tools typically employed in systems neuroscience, we show that this minimal change in initial conditions prior to training leads to substantial differences in intermediate and higher-level network representations despite similar network-level classification performance. We locate the origins of the effects in an under-constrained alignment of category exemplars, rather than misaligned category centroids. These results call into question the common practice of using single networks to derive insights into neural information processing and rather suggest that computational neuroscientists working with DNNs may need to base their inferences on groups of multiple network instances.

Subject(s)

Cognitive Neuroscience/methods , Individuality , Neural Networks, Computer , Animals , Brain

Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision.

Spoerer, Courtney J; Kietzmann, Tim C; Mehrer, Johannes; Charest, Ian; Kriegeskorte, Nikolaus.

PLoS Comput Biol ; 16(10): e1008215, 2020 10.

Article in English | MEDLINE | ID: mdl-33006992

ABSTRACT

Deep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model's reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.

Subject(s)

Models, Neurological , Neural Networks, Computer , Reaction Time/physiology , Vision, Ocular/physiology , Visual Perception/physiology , Adult , Computational Biology , Female , Humans , Male , Young Adult

Recurrence is required to capture the representational dynamics of the human visual system.

Kietzmann, Tim C; Spoerer, Courtney J; Sörensen, Lynn K A; Cichy, Radoslaw M; Hauk, Olaf; Kriegeskorte, Nikolaus.

Proc Natl Acad Sci U S A ; 116(43): 21854-21863, 2019 10 22.

Article in English | MEDLINE | ID: mdl-31591217

ABSTRACT

The human visual system is an intricate network of brain regions that enables us to recognize the world around us. Despite its abundant lateral and feedback connections, object processing is commonly viewed and studied as a feedforward process. Here, we measure and model the rapid representational dynamics across multiple stages of the human ventral stream using time-resolved brain imaging and deep learning. We observe substantial representational transformations during the first 300 ms of processing within and across ventral-stream regions. Categorical divisions emerge in sequence, cascading forward and in reverse across regions, and Granger causality analysis suggests bidirectional information flow between regions. Finally, recurrent deep neural network models clearly outperform parameter-matched feedforward models in terms of their ability to capture the multiregion cortical dynamics. Targeted virtual cooling experiments on the recurrent deep network models further substantiate the importance of their lateral and top-down connections. These results establish that recurrent models are required to understand information processing in the human ventral stream.

Subject(s)

Models, Neurological , Visual Perception/physiology , Adult , Deep Learning , Feedback, Sensory , Female , Humans , Magnetoencephalography , Nerve Net , Visual Pathways

Corrigendum: Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition.

Spoerer, Courtney J; McClure, Patrick; Kriegeskorte, Nikolaus.

Front Psychol ; 9: 1695, 2018.

Article in English | MEDLINE | ID: mdl-30250446

ABSTRACT

[This corrects the article DOI: 10.3389/fpsyg.2017.01551.].

Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition.

Spoerer, Courtney J; McClure, Patrick; Kriegeskorte, Nikolaus.

Front Psychol ; 8: 1551, 2017.

Article in English | MEDLINE | ID: mdl-28955272

ABSTRACT

Feedforward neural networks provide the dominant model of how the brain performs visual object recognition. However, these networks lack the lateral and feedback connections, and the resulting recurrent neuronal dynamics, of the ventral visual pathway in the human and non-human primate brain. Here we investigate recurrent convolutional neural networks with bottom-up (B), lateral (L), and top-down (T) connections. Combining these types of connections yields four architectures (B, BT, BL, and BLT), which we systematically test and compare. We hypothesized that recurrent dynamics might improve recognition performance in the challenging scenario of partial occlusion. We introduce two novel occluded object recognition tasks to test the efficacy of the models, digit clutter (where multiple target digits occlude one another) and digit debris (where target digits are occluded by digit fragments). We find that recurrent neural networks outperform feedforward control models (approximately matched in parametric complexity) at recognizing objects, both in the absence of occlusion and in all occlusion conditions. Recurrent networks were also found to be more robust to the inclusion of additive Gaussian noise. Recurrent neural networks are better in two respects: (1) they are more neurobiologically realistic than their feedforward counterparts; (2) they are better in terms of their ability to recognize objects, especially under challenging conditions. This work shows that computer vision can benefit from using recurrent convolutional architectures and suggests that the ubiquitous recurrent connections in biological brains are essential for task performance.

A computational exploration of complementary learning mechanisms in the primate ventral visual pathway.

Spoerer, Courtney J; Eguchi, Akihiro; Stringer, Simon M.

Vision Res ; 119: 16-28, 2016 Feb.

Article in English | MEDLINE | ID: mdl-26774861

ABSTRACT

In order to develop transformation invariant representations of objects, the visual system must make use of constraints placed upon object transformation by the environment. For example, objects transform continuously from one point to another in both space and time. These two constraints have been exploited separately in order to develop translation and view invariance in a hierarchical multilayer model of the primate ventral visual pathway in the form of continuous transformation learning and temporal trace learning. We show for the first time that these two learning rules can work cooperatively in the model. Using these two learning rules together can support the development of invariance in cells and help maintain object selectivity when stimuli are presented over a large number of locations or when trained separately over a large number of viewing angles.

Subject(s)

Form Perception/physiology , Learning/physiology , Models, Neurological , Visual Cortex/physiology , Visual Pathways/physiology , Animals , Computer Simulation , Humans , Photic Stimulation , Primates

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL