Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 113
Filter
1.
Elife ; 132024 Apr 25.
Article in English | MEDLINE | ID: mdl-38661128

ABSTRACT

Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g. left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.


Subject(s)
Neural Networks, Computer , Animals , Brain/physiology , Neurons/physiology , Macaca , Models, Neurological , Macaca mulatta
2.
PLoS One ; 19(3): e0293440, 2024.
Article in English | MEDLINE | ID: mdl-38512838

ABSTRACT

Recent work has suggested that feedforward residual neural networks (ResNets) approximate iterative recurrent computations. Iterative computations are useful in many domains, so they might provide good solutions for neural networks to learn. However, principled methods for measuring and manipulating iterative convergence in neural networks remain lacking. Here we address this gap by 1) quantifying the degree to which ResNets learn iterative solutions and 2) introducing a regularization approach that encourages the learning of iterative solutions. Iterative methods are characterized by two properties: iteration and convergence. To quantify these properties, we define three indices of iterative convergence. Consistent with previous work, we show that, even though ResNets can express iterative solutions, they do not learn them when trained conventionally on computer-vision tasks. We then introduce regularizations to encourage iterative convergent computation and test whether this provides a useful inductive bias. To make the networks more iterative, we manipulate the degree of weight sharing across layers using soft gradient coupling. This new method provides a form of recurrence regularization and can interpolate smoothly between an ordinary ResNet and a "recurrent" ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). To make the networks more convergent we impose a Lipschitz constraint on the residual functions using spectral normalization. The three indices of iterative convergence reveal that the gradient coupling and the Lipschitz constraint succeed at making the networks iterative and convergent, respectively. To showcase the practicality of our approach, we study how iterative convergence impacts generalization on standard visual recognition tasks (MNIST, CIFAR-10, CIFAR-100) or challenging recognition tasks with partial occlusions (Digitclutter). We find that iterative convergent computation, in these tasks, does not provide a useful inductive bias for ResNets. Importantly, our approach may be useful for investigating other network architectures and tasks as well and we hope that our study provides a useful starting point for investigating the broader question of whether iterative convergence can help neural networks in their generalization.


Subject(s)
Learning , Neural Networks, Computer , Generalization, Psychological
3.
ArXiv ; 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38259351

ABSTRACT

Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control. In this conception, vision is driven by the sensory data, and perception is direct because the processing proceeds from the data to the latent variables of interest. The notion of "inference" in this conception is that of the engineering literature on neural networks, where feedforward convolutional neural networks processing images are said to perform inference. The alternative conception is that of vision as an inference process in Helmholtz's sense, where the sensory evidence is evaluated in the context of a generative model of the causal processes that give rise to it. In this conception, vision inverts a generative model through an interrogation of the sensory evidence in a process often thought to involve top-down predictions of sensory data to evaluate the likelihood of alternative hypotheses. The authors include scientists rooted in roughly equal numbers in each of the conceptions and motivated to overcome what might be a false dichotomy between them and engage the other perspective in the realm of theory and experiment. The primate brain employs an unknown algorithm that may combine the advantages of both conceptions. We explain and clarify the terminology, review the key empirical evidence, and propose an empirical research program that transcends the dichotomy and sets the stage for revealing the mysterious hybrid algorithm of primate vision.

4.
Behav Brain Sci ; 46: e392, 2023 Dec 06.
Article in English | MEDLINE | ID: mdl-38054329

ABSTRACT

An ideal vision model accounts for behavior and neurophysiology in both naturalistic conditions and designed lab experiments. Unlike psychological theories, artificial neural networks (ANNs) actually perform visual tasks and generate testable predictions for arbitrary inputs. These advantages enable ANNs to engage the entire spectrum of the evidence. Failures of particular models drive progress in a vibrant ANN research program of human vision.


Subject(s)
Language , Neural Networks, Computer , Humans
5.
Sci Rep ; 13(1): 14375, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37658079

ABSTRACT

Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens, a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.


Subject(s)
Brain , Cognition , Engineering , Metadata , Neural Networks, Computer
6.
Elife ; 122023 08 23.
Article in English | MEDLINE | ID: mdl-37610302

ABSTRACT

Neuroscience has recently made much progress, expanding the complexity of both neural activity measurements and brain-computational models. However, we lack robust methods for connecting theory and experiment by evaluating our new big models with our new big data. Here, we introduce new inference methods enabling researchers to evaluate and compare models based on the accuracy of their predictions of representational geometries: A good model should accurately predict the distances among the neural population representations (e.g. of a set of stimuli). Our inference methods combine novel 2-factor extensions of crossvalidation (to prevent overfitting to either subjects or conditions from inflating our estimates of model accuracy) and bootstrapping (to enable inferential model comparison with simultaneous generalization to both new subjects and new conditions). We validate the inference methods on data where the ground-truth model is known, by simulating data with deep neural networks and by resampling of calcium-imaging and functional MRI data. Results demonstrate that the methods are valid and conclusions generalize correctly. These data analysis methods are available in an open-source Python toolbox (rsatoolbox.readthedocs.io).


Subject(s)
Big Data , Neurosciences , Humans , Calcium, Dietary , Generalization, Psychological , Neural Networks, Computer
7.
Nat Rev Neurosci ; 24(7): 431-450, 2023 07.
Article in English | MEDLINE | ID: mdl-37253949

ABSTRACT

Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.


Subject(s)
Brain , Neural Networks, Computer , Humans , Brain/physiology
8.
bioRxiv ; 2023 Mar 18.
Article in English | MEDLINE | ID: mdl-36993311

ABSTRACT

Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens , a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.

9.
J Neurosci ; 43(10): 1731-1741, 2023 03 08.
Article in English | MEDLINE | ID: mdl-36759190

ABSTRACT

Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. We address this issue by asking which representational features are currently unaccounted for in neural time series data, estimated for multiple areas of the ventral stream via source-reconstructed magnetoencephalography data acquired in human participants (nine females, six males) during object viewing. We focus on the ability of visuo-semantic models, consisting of human-generated labels of object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual reversal in the relative importance of DNN versus visuo-semantic features as ventral-stream object representations unfold over space and time. Although lower-level visual areas are better explained by DNN features starting early in time (at 66 ms after stimulus onset), higher-level cortical dynamics are best accounted for by visuo-semantic features starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. These results show that a significant component of the variance unexplained by DNNs in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of the objects. We conclude that current DNNs fail to fully capture dynamic representations in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.SIGNIFICANCE STATEMENT When we view objects such as faces and cars in our visual environment, their neural representations dynamically unfold over time at a millisecond scale. These dynamics reflect the cortical computations that support fast and robust object recognition. DNNs have emerged as a promising framework for modeling these computations but cannot yet fully account for the neural dynamics. Using magnetoencephalography data acquired in human observers during object viewing, we show that readily nameable aspects of objects, such as 'eye', 'wheel', and 'face', can account for variance in the neural dynamics over and above DNNs. These findings suggest that DNNs and humans may in part rely on different object features for visual recognition and provide guidelines for model improvement.


Subject(s)
Pattern Recognition, Visual , Semantics , Male , Female , Humans , Neural Networks, Computer , Visual Perception , Brain , Brain Mapping/methods , Magnetic Resonance Imaging/methods
10.
bioRxiv ; 2023 Jul 06.
Article in English | MEDLINE | ID: mdl-36711779

ABSTRACT

Primates can recognize objects despite 3D geometric variations such as in-depth rotations. The computational mechanisms that give rise to such invariances are yet to be fully understood. A curious case of partial invariance occurs in the macaque face-patch AL and in fully connected layers of deep convolutional networks in which neurons respond similarly to mirror-symmetric views (e.g., left and right profiles). Why does this tuning develop? Here, we propose a simple learning-driven explanation for mirror-symmetric viewpoint tuning. We show that mirror-symmetric viewpoint tuning for faces emerges in the fully connected layers of convolutional deep neural networks trained on object recognition tasks, even when the training dataset does not include faces. First, using 3D objects rendered from multiple views as test stimuli, we demonstrate that mirror-symmetric viewpoint tuning in convolutional neural network models is not unique to faces: it emerges for multiple object categories with bilateral symmetry. Second, we show why this invariance emerges in the models. Learning to discriminate among bilaterally symmetric object categories induces reflection-equivariant intermediate representations. AL-like mirror-symmetric tuning is achieved when such equivariant responses are spatially pooled by downstream units with sufficiently large receptive fields. These results explain how mirror-symmetric viewpoint tuning can emerge in neural networks, providing a theory of how they might emerge in the primate brain. Our theory predicts that mirror-symmetric viewpoint tuning can emerge as a consequence of exposure to bilaterally symmetric objects beyond the category of faces, and that it can generalize beyond previously experienced object categories.

11.
Commun Biol ; 5(1): 1247, 2022 11 14.
Article in English | MEDLINE | ID: mdl-36376446

ABSTRACT

Distinguishing animate from inanimate things is of great behavioural importance. Despite distinct brain and behavioural responses to animate and inanimate things, it remains unclear which object properties drive these responses. Here, we investigate the importance of five object dimensions related to animacy ("being alive", "looking like an animal", "having agency", "having mobility", and "being unpredictable") in brain (fMRI, EEG) and behaviour (property and similarity judgements) of 19 participants. We used a stimulus set of 128 images, optimized by a genetic algorithm to disentangle these five dimensions. The five dimensions explained much variance in the similarity judgments. Each dimension explained significant variance in the brain representations (except, surprisingly, "being alive"), however, to a lesser extent than in behaviour. Different brain regions sensitive to animacy may represent distinct dimensions, either as accessible perceptual stepping stones toward detecting whether something is alive or because they are of behavioural importance in their own right.


Subject(s)
Brain , Pattern Recognition, Visual , Humans , Pattern Recognition, Visual/physiology , Brain/diagnostic imaging , Brain/physiology , Brain Mapping , Magnetic Resonance Imaging/methods , Judgment/physiology
12.
Proc Natl Acad Sci U S A ; 119(27): e2115047119, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35767642

ABSTRACT

Human vision is attuned to the subtle differences between individual faces. Yet we lack a quantitative way of predicting how similar two face images look and whether they appear to show the same person. Principal component-based three-dimensional (3D) morphable models are widely used to generate stimuli in face perception research. These models capture the distribution of real human faces in terms of dimensions of physical shape and texture. How well does a "face space" based on these dimensions capture the similarity relationships humans perceive among faces? To answer this, we designed a behavioral task to collect dissimilarity and same/different identity judgments for 232 pairs of realistic faces. Stimuli sampled geometric relationships in a face space derived from principal components of 3D shape and texture (Basel face model [BFM]). We then compared a wide range of models in their ability to predict the data, including the BFM from which faces were generated, an active appearance model derived from face photographs, and image-computable models of visual perception. Euclidean distance in the BFM explained both dissimilarity and identity judgments surprisingly well. In a comparison against 16 diverse models, BFM distance was competitive with representational distances in state-of-the-art deep neural networks (DNNs), including novel DNNs trained on BFM synthetic identities or BFM latents. Models capturing the distribution of face shape and texture across individuals are not only useful tools for stimulus generation. They also capture important information about how faces are perceived, suggesting that human face representations are tuned to the statistical distribution of faces.


Subject(s)
Facial Recognition , Judgment , Visual Perception , Humans , Neural Networks, Computer
13.
Nat Hum Behav ; 5(9): 1127-1144, 2021 09.
Article in English | MEDLINE | ID: mdl-34545237

ABSTRACT

Human visual perception carves a scene at its physical joints, decomposing the world into objects, which are selectively attended, tracked and predicted as we engage our surroundings. Object representations emancipate perception from the sensory input, enabling us to keep in mind that which is out of sight and to use perceptual content as a basis for action and symbolic cognition. Human behavioural studies have documented how object representations emerge through grouping, amodal completion, proto-objects and object files. By contrast, deep neural network models of visual object recognition remain largely tethered to sensory input, despite achieving human-level performance at labelling objects. Here, we review related work in both fields and examine how these fields can help each other. The cognitive literature provides a starting point for the development of new experimental tasks that reveal mechanisms of human object perception and serve as benchmarks driving the development of deep neural network models that will put the object into object recognition.


Subject(s)
Pattern Recognition, Visual/physiology , Recognition, Psychology/physiology , Visual Perception/physiology , Humans , Neural Networks, Computer , Visual Pathways/physiology
14.
Nat Rev Neurosci ; 22(11): 703-718, 2021 11.
Article in English | MEDLINE | ID: mdl-34522043

ABSTRACT

A central goal of neuroscience is to understand the representations formed by brain activity patterns and their connection to behaviour. The classic approach is to investigate how individual neurons encode stimuli and how their tuning determines the fidelity of the neural representation. Tuning analyses often use the Fisher information to characterize the sensitivity of neural responses to small changes of the stimulus. In recent decades, measurements of large populations of neurons have motivated a complementary approach, which focuses on the information available to linear decoders. The decodable information is captured by the geometry of the representational patterns in the multivariate response space. Here we review neural tuning and representational geometry with the goal of clarifying the relationship between them. The tuning induces the geometry, but different sets of tuned neurons can induce the same geometry. The geometry determines the Fisher information, the mutual information and the behavioural performance of an ideal observer in a range of psychophysical tasks. We argue that future studies can benefit from considering both tuning and geometry to understand neural codes and reveal the connections between stimuli, brain activity and behaviour.


Subject(s)
Brain/physiology , Models, Neurological , Models, Theoretical , Neurons/physiology , Animals , Brain/cytology , Humans
15.
J Cogn Neurosci ; 33(10): 2044-2064, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34272948

ABSTRACT

Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on their ability to explain the representational geometry of 62 object images in human inferior temporal cortex (hIT), as measured with fMRI. We compare untrained networks to their task-trained counterparts and assess the effect of cross-validated fitting to hIT, by taking a weighted combination of the principal components of features within each layer and, subsequently, a weighted combination of layers. For each combination of training and fitting, we test all models for their correlation with the hIT representational dissimilarity matrix, using independent images and subjects. Trained models outperform untrained models (accounting for 57% more of the explainable variance), suggesting that structured visual features are important for explaining hIT. Model fitting further improves the alignment of DNN and hIT representations (by 124%), suggesting that the relative prevalence of different features in hIT does not readily emerge from the Imagenet object-recognition task used to train the networks. The same models can also explain the disparate representations in primary visual cortex (V1), where stronger weights are given to earlier layers. In each region, all architectures achieved equivalently high performance once trained and fitted. The models' shared properties-deep feedforward hierarchies of spatially restricted nonlinear filters-seem more important than their differences, when modeling human visual representations.


Subject(s)
Neural Networks, Computer , Visual Cortex , Humans , Magnetic Resonance Imaging , Temporal Lobe/diagnostic imaging , Visual Cortex/diagnostic imaging , Visual Perception
16.
J Neurosci ; 2021 Jun 04.
Article in English | MEDLINE | ID: mdl-34099508

ABSTRACT

Social behaviour is coordinated by a network of brain regions, including those involved in the perception of social stimuli and those involved in complex functions like inferring perceptual and mental states and controlling social interactions. The properties and function of many of these regions in isolation is relatively well-understood, but less is known about how these regions interact whilst processing dynamic social interactions. To investigate whether the functional connectivity between brain regions is modulated by social context, we collected functional MRI (fMRI) data from male monkeys (Macaca mulatta) viewing videos of social interactions labelled as "affiliative", "aggressive", or "ambiguous". We show activation related to the perception of social interactions along both banks of the superior temporal sulcus, parietal cortex, medial and lateral frontal cortex, and the caudate nucleus. Within this network, we show that fronto-temporal functional connectivity is significantly modulated by social context. Crucially, we link the observation of specific behaviours to changes in functional connectivity within our network. Viewing aggressive behaviour was associated with a limited increase in temporo-temporal and a weak increase in cingulate-temporal connectivity. By contrast, viewing interactions where the outcome was uncertain was associated with a pronounced increase in temporo-temporal, and cingulate-temporal functional connectivity. We hypothesise that this widespread network synchronisation occurs when cingulate and temporal areas coordinate their activity when more difficult social inferences are being made.SIGNIFICANCE STATEMENT:Processing social information from our environment requires the activation of several brain regions, which are concentrated within the frontal and temporal lobes. However, little is known about how these areas interact to facilitate the processing of different social interactions. Here we show that functional connectivity within and between the frontal and temporal lobes is modulated by social context. Specifically, we demonstrate that viewing social interactions where the outcome was unclear is associated with increased synchrony within and between the cingulate cortex and temporal cortices. These findings suggest that the coordination between the cingulate and temporal cortices is enhanced when more difficult social inferences are being made.

17.
Neuron ; 109(14): 2224-2238, 2021 07 21.
Article in English | MEDLINE | ID: mdl-34143951

ABSTRACT

The movements an organism makes provide insights into its internal states and motives. This principle is the foundation of the new field of computational ethology, which links rich automatic measurements of natural behaviors to motivational states and neural activity. Computational ethology has proven transformative for animal behavioral neuroscience. This success raises the question of whether rich automatic measurements of behavior can similarly drive progress in human neuroscience and psychology. New technologies for capturing and analyzing complex behaviors in real and virtual environments enable us to probe the human brain during naturalistic dynamic interactions with the environment that so far were beyond experimental investigation. Inspired by nonhuman computational ethology, we explore how these new tools can be used to test important questions in human neuroscience. We argue that application of this methodology will help human neuroscience and psychology extend limited behavioral measurements such as reaction time and accuracy, permit novel insights into how the human brain produces behavior, and ultimately reduce the growing measurement gap between human and animal neuroscience.


Subject(s)
Brain , Cognition , Ethology/methods , Neurosciences/methods , Humans
18.
PLoS One ; 16(4): e0250474, 2021.
Article in English | MEDLINE | ID: mdl-33872341

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0232551.].

19.
Nat Hum Behav ; 5(9): 1203-1213, 2021 09.
Article in English | MEDLINE | ID: mdl-33707658

ABSTRACT

Long-standing affective science theories conceive the perception of emotional stimuli either as discrete categories (for example, an angry voice) or continuous dimensional attributes (for example, an intense and negative vocal emotion). Which position provides a better account is still widely debated. Here we contrast the positions to account for acoustics-independent perceptual and cerebral representational geometry of perceived voice emotions. We combined multimodal imaging of the cerebral response to heard vocal stimuli (using functional magnetic resonance imaging and magneto-encephalography) with post-scanning behavioural assessment of voice emotion perception. By using representational similarity analysis, we find that categories prevail in perceptual and early (less than 200 ms) frontotemporal cerebral representational geometries and that dimensions impinge predominantly on a later limbic-temporal network (at 240 ms and after 500 ms). These results reconcile the two opposing views by reframing the perception of emotions as the interplay of cerebral networks with different representational dynamics that emphasize either categories or dimensions.


Subject(s)
Arousal/physiology , Emotions/physiology , Speech Perception/physiology , Acoustic Stimulation/methods , Anger , Humans , Voice/physiology
20.
Proc Natl Acad Sci U S A ; 118(8)2021 02 23.
Article in English | MEDLINE | ID: mdl-33593900

ABSTRACT

Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.


Subject(s)
Deep Learning , Ecology , Models, Neurological , Neural Networks, Computer , Pattern Recognition, Visual , Visual Cortex/physiology , Visual Perception/physiology , Brain Mapping , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...