Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
Curr Biol ; 34(5): 1098-1106.e5, 2024 03 11.
Article in English | MEDLINE | ID: mdl-38218184

ABSTRACT

Visual shape perception is central to many everyday tasks, from object recognition to grasping and handling tools.1,2,3,4,5,6,7,8,9,10 Yet how shape is encoded in the visual system remains poorly understood. Here, we probed shape representations using visual aftereffects-perceptual distortions that occur following extended exposure to a stimulus.11,12,13,14,15,16,17 Such effects are thought to be caused by adaptation in neural populations that encode both simple, low-level stimulus characteristics17,18,19,20 and more abstract, high-level object features.21,22,23 To tease these two contributions apart, we used machine-learning methods to synthesize novel shapes in a multidimensional shape space, derived from a large database of natural shapes.24 Stimuli were carefully selected such that low-level and high-level adaptation models made distinct predictions about the shapes that observers would perceive following adaptation. We found that adaptation along vector trajectories in the high-level shape space predicted shape aftereffects better than simple low-level processes. Our findings reveal the central role of high-level statistical features in the visual representation of shape. The findings also hint that human vision is attuned to the distribution of shapes experienced in the natural environment.


Subject(s)
Vision, Ocular , Visual Perception , Humans , Perceptual Distortion , Environment , Pattern Recognition, Visual , Photic Stimulation
2.
Behav Brain Sci ; 46: e386, 2023 Dec 06.
Article in English | MEDLINE | ID: mdl-38054335

ABSTRACT

Everyone agrees that testing hypotheses is important, but Bowers et al. provide scant details about where hypotheses about perception and brain function should come from. We suggest that the answer lies in considering how information about the outside world could be acquired - that is, learned - over the course of evolution and development. Deep neural networks (DNNs) provide one tool to address this question.


Subject(s)
Brain , Neural Networks, Computer , Humans , Learning
3.
J Vis ; 23(7): 8, 2023 07 03.
Article in English | MEDLINE | ID: mdl-37432844

ABSTRACT

When we look at an object, we simultaneously see how glossy or matte it is, how light or dark, and what color. Yet, at each point on the object's surface, both diffuse and specular reflections are mixed in different proportions, resulting in substantial spatial chromatic and luminance variations. To further complicate matters, this pattern changes radically when the object is viewed under different lighting conditions. The purpose of this study was to simultaneously measure our ability to judge color and gloss using an image set capturing diverse object and illuminant properties. Participants adjusted the hue, lightness, chroma, and specular reflectance of a reference object so that it appeared to be made of the same material as a test object. Critically, the two objects were presented under different lighting environments. We found that hue matches were highly accurate, except for under a chromatically atypical illuminant. Chroma and lightness constancy were generally poor, but these failures correlated well with simple image statistics. Gloss constancy was particularly poor, and these failures were only partially explained by reflection contrast. Importantly, across all measures, participants were highly consistent with one another in their deviations from constancy. Although color and gloss constancy hold well in simple conditions, the variety of lighting and shape in the real world presents significant challenges to our visual system's ability to judge intrinsic material properties.


Subject(s)
Lighting , Humans
4.
Vision Res ; 206: 108195, 2023 05.
Article in English | MEDLINE | ID: mdl-36801664

ABSTRACT

Why do we perceive illusory motion in some static images? Several accounts point to eye movements, response latencies to different image elements, or interactions between image patterns and motion energy detectors. Recently PredNet, a recurrent deep neural network (DNN) based on predictive coding principles, was reported to reproduce the "Rotating Snakes" illusion, suggesting a role for predictive coding. We begin by replicating this finding, then use a series of "in silico" psychophysics and electrophysiology experiments to examine whether PredNet behaves consistently with human observers and non-human primate neural data. A pretrained PredNet predicted illusory motion for all subcomponents of the Rotating Snakes pattern, consistent with human observers. However, we found no simple response delays in internal units, unlike evidence from electrophysiological data. PredNet's detection of motion in gradients seemed dependent on contrast, but depends predominantly on luminance in humans. Finally, we examined the robustness of the illusion across ten PredNets of identical architecture, retrained on the same video data. There was large variation across network instances in whether they reproduced the Rotating Snakes illusion, and what motion, if any, they predicted for simplified variants. Unlike human observers, no network predicted motion for greyscale variants of the Rotating Snakes pattern. Our results sound a cautionary note: even when a DNN successfully reproduces some idiosyncrasy of human vision, more detailed investigation can reveal inconsistencies between humans and the network, and between different instances of the same network. These inconsistencies suggest that predictive coding does not reliably give rise to human-like illusory motion.


Subject(s)
Illusions , Motion Perception , Animals , Humans , Illusions/physiology , Motion Perception/physiology , Vision, Ocular , Eye Movements , Neural Networks, Computer
5.
Curr Biol ; 32(21): R1224-R1225, 2022 11 07.
Article in English | MEDLINE | ID: mdl-36347228

ABSTRACT

The discovery of mental rotation was one of the most significant landmarks in experimental psychology, leading to the ongoing assumption that to visually compare objects from different three-dimensional viewpoints, we use explicit internal simulations of object rotations, to 'mentally adjust' one object until it matches the other1. These rotations are thought to be performed on three-dimensional representations of the object, by literal analogy to physical rotations. In particular, it is thought that an imagined object is continuously adjusted at a constant three-dimensional angular rotation rate from its initial orientation to the final orientation through all intervening viewpoints2. While qualitative theories have tried to account for this phenomenon3, to date there has been no explicit, image-computable model of the underlying processes. As a result, there is no quantitative account of why some object viewpoints appear more similar to one another than others when the three-dimensional angular difference between them is the same4,5. We reasoned that the specific pattern of non-uniformities in the perception of viewpoints can reveal the visual computations underlying mental rotation. We therefore compared human viewpoint perception with a model based on the kind of two-dimensional 'optical flow' computations that are thought to underlie motion perception in biological vision6, finding that the model reproduces the specific errors that participants make. This suggests that mental rotation involves simulating the two-dimensional retinal image change that would occur when rotating objects. When we compare objects, we do not do so in a distal three-dimensional representation as previously assumed, but by measuring how much the proximal stimulus would change if we watched the object rotate, capturing perspectival appearance changes7.


Subject(s)
Motion Perception , Optic Flow , Humans , Pattern Recognition, Visual , Visual Perception
6.
Proc Natl Acad Sci U S A ; 119(27): e2115047119, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35767642

ABSTRACT

Human vision is attuned to the subtle differences between individual faces. Yet we lack a quantitative way of predicting how similar two face images look and whether they appear to show the same person. Principal component-based three-dimensional (3D) morphable models are widely used to generate stimuli in face perception research. These models capture the distribution of real human faces in terms of dimensions of physical shape and texture. How well does a "face space" based on these dimensions capture the similarity relationships humans perceive among faces? To answer this, we designed a behavioral task to collect dissimilarity and same/different identity judgments for 232 pairs of realistic faces. Stimuli sampled geometric relationships in a face space derived from principal components of 3D shape and texture (Basel face model [BFM]). We then compared a wide range of models in their ability to predict the data, including the BFM from which faces were generated, an active appearance model derived from face photographs, and image-computable models of visual perception. Euclidean distance in the BFM explained both dissimilarity and identity judgments surprisingly well. In a comparison against 16 diverse models, BFM distance was competitive with representational distances in state-of-the-art deep neural networks (DNNs), including novel DNNs trained on BFM synthetic identities or BFM latents. Models capturing the distribution of face shape and texture across individuals are not only useful tools for stimulus generation. They also capture important information about how faces are perceived, suggesting that human face representations are tuned to the statistical distribution of faces.


Subject(s)
Facial Recognition , Judgment , Visual Perception , Humans , Neural Networks, Computer
7.
Proc Natl Acad Sci U S A ; 118(32)2021 08 10.
Article in English | MEDLINE | ID: mdl-34349023

ABSTRACT

Sitting in a static railway carriage can produce illusory self-motion if the train on an adjoining track moves off. While our visual system registers motion, vestibular signals indicate that we are stationary. The brain is faced with a difficult challenge: is there a single cause of sensations (I am moving) or two causes (I am static, another train is moving)? If a single cause, integrating signals produces a more precise estimate of self-motion, but if not, one cue should be ignored. In many cases, this process of causal inference works without error, but how does the brain achieve it? Electrophysiological recordings show that the macaque medial superior temporal area contains many neurons that encode combinations of vestibular and visual motion cues. Some respond best to vestibular and visual motion in the same direction ("congruent" neurons), while others prefer opposing directions ("opposite" neurons). Congruent neurons could underlie cue integration, but the function of opposite neurons remains a puzzle. Here, we seek to explain this computational arrangement by training a neural network model to solve causal inference for motion estimation. Like biological systems, the model develops congruent and opposite units and recapitulates known behavioral and neurophysiological observations. We show that all units (both congruent and opposite) contribute to motion estimation. Importantly, however, it is the balance between their activity that distinguishes whether visual and vestibular cues should be integrated or separated. This explains the computational purpose of puzzling neural representations and shows how a relatively simple feedforward network can solve causal inference.


Subject(s)
Motion Perception/physiology , Neural Networks, Computer , Sensory Receptor Cells/physiology , Animals , Cues , Macaca mulatta , Photic Stimulation , Temporal Lobe/physiology
8.
J Cogn Neurosci ; 33(10): 2044-2064, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34272948

ABSTRACT

Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on their ability to explain the representational geometry of 62 object images in human inferior temporal cortex (hIT), as measured with fMRI. We compare untrained networks to their task-trained counterparts and assess the effect of cross-validated fitting to hIT, by taking a weighted combination of the principal components of features within each layer and, subsequently, a weighted combination of layers. For each combination of training and fitting, we test all models for their correlation with the hIT representational dissimilarity matrix, using independent images and subjects. Trained models outperform untrained models (accounting for 57% more of the explainable variance), suggesting that structured visual features are important for explaining hIT. Model fitting further improves the alignment of DNN and hIT representations (by 124%), suggesting that the relative prevalence of different features in hIT does not readily emerge from the Imagenet object-recognition task used to train the networks. The same models can also explain the disparate representations in primary visual cortex (V1), where stronger weights are given to earlier layers. In each region, all architectures achieved equivalently high performance once trained and fitted. The models' shared properties-deep feedforward hierarchies of spatially restricted nonlinear filters-seem more important than their differences, when modeling human visual representations.


Subject(s)
Neural Networks, Computer , Visual Cortex , Humans , Magnetic Resonance Imaging , Temporal Lobe/diagnostic imaging , Visual Cortex/diagnostic imaging , Visual Perception
9.
Nat Hum Behav ; 5(10): 1402-1417, 2021 10.
Article in English | MEDLINE | ID: mdl-33958744

ABSTRACT

Reflectance, lighting and geometry combine in complex ways to create images. How do we disentangle these to perceive individual properties, such as surface glossiness? We suggest that brains disentangle properties by learning to model statistical structure in proximal images. To test this hypothesis, we trained unsupervised generative neural networks on renderings of glossy surfaces and compared their representations with human gloss judgements. The networks spontaneously cluster images according to distal properties such as reflectance and illumination, despite receiving no explicit information about these properties. Intriguingly, the resulting representations also predict the specific patterns of 'successes' and 'errors' in human perception. Linearly decoding specular reflectance from the model's internal code predicts human gloss perception better than ground truth, supervised networks or control models, and it predicts, on an image-by-image basis, illusions of gloss perception caused by interactions between material, shape and lighting. Unsupervised learning may underlie many perceptual dimensions in vision and beyond.


Subject(s)
Light , Surface Properties , Visual Perception/physiology , Computer Graphics , Contrast Sensitivity , Form Perception , Humans , Lighting/methods , Materials Science , Photic Stimulation , Psychophysics/instrumentation , Psychophysics/methods , Task Performance and Analysis
10.
J Neurosci ; 41(9): 1952-1969, 2021 03 03.
Article in English | MEDLINE | ID: mdl-33452225

ABSTRACT

Faces of different people elicit distinct fMRI patterns in several face-selective regions of the human brain. Here we used representational similarity analysis to investigate what type of identity-distinguishing information is encoded in three face-selective regions: fusiform face area (FFA), occipital face area (OFA), and posterior superior temporal sulcus (pSTS). In a sample of 30 human participants (22 females, 8 males), we used fMRI to measure brain activity patterns elicited by naturalistic videos of famous face identities, and compared their representational distances in each region with models of the differences between identities. We built diverse candidate models, ranging from low-level image-computable properties (pixel-wise, GIST, and Gabor-Jet dissimilarities), through higher-level image-computable descriptions (OpenFace deep neural network, trained to cluster faces by identity), to complex human-rated properties (perceived similarity, social traits, and gender). We found marked differences in the information represented by the FFA and OFA. Dissimilarities between face identities in FFA were accounted for by differences in perceived similarity, Social Traits, Gender, and by the OpenFace network. In contrast, representational distances in OFA were mainly driven by differences in low-level image-based properties (pixel-wise and Gabor-Jet dissimilarities). Our results suggest that, although FFA and OFA can both discriminate between identities, the FFA representation is further removed from the image, encoding higher-level perceptual and social face information.SIGNIFICANCE STATEMENT Recent studies using fMRI have shown that several face-responsive brain regions can distinguish between different face identities. It is however unclear whether these different face-responsive regions distinguish between identities in similar or different ways. We used representational similarity analysis to investigate the computations within three brain regions in response to naturalistically varying videos of face identities. Our results revealed that two regions, the fusiform face area and the occipital face area, encode distinct identity information about faces. Although identity can be decoded from both regions, identity representations in fusiform face area primarily contained information about social traits, gender, and high-level visual features, whereas occipital face area primarily represented lower-level image features.


Subject(s)
Brain/physiology , Facial Recognition/physiology , Models, Neurological , Brain Mapping/methods , Female , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Male
11.
J Neurosci ; 40(37): 7010-7012, 2020 09 09.
Article in English | MEDLINE | ID: mdl-32907932
12.
Curr Opin Behav Sci ; 30: 100-108, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31886321

ABSTRACT

Materials with complex appearances, like textiles and foodstuffs, pose challenges for conventional theories of vision. But recent advances in unsupervised deep learning provide a framework for explaining how we learn to see them. We suggest that perception does not involve estimating physical quantities like reflectance or lighting. Instead, representations emerge from learning to encode and predict the visual input as efficiently and accurately as possible. Neural networks can be trained to compress natural images or to predict frames in movies without 'ground truth' data about the outside world. Yet, to succeed, such systems may automatically discover how to disentangle distal causal factors. Such 'statistical appearance models' potentially provide a coherent explanation of both failures and successes in perception.

13.
Front Psychol ; 8: 1726, 2017.
Article in English | MEDLINE | ID: mdl-29062291

ABSTRACT

Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human behavior on complex cognitive tasks. Recent reports suggest that DNNs can explain significant variance in one such task, judging object similarity. Here, we extend these findings by replicating them for a rich set of object images, comparing performance across layers within two DNNs of different depths, and examining how the DNNs' performance compares to that of non-computational "conceptual" models. Human observers performed similarity judgments for a set of 92 images of real-world objects. Representations of the same images were obtained in each of the layers of two DNNs of different depths (8-layer AlexNet and 16-layer VGG-16). To create conceptual models, other human observers generated visual-feature labels (e.g., "eye") and category labels (e.g., "animal") for the same image set. Feature labels were divided into parts, colors, textures and contours, while category labels were divided into subordinate, basic, and superordinate categories. We fitted models derived from the features, categories, and from each layer of each DNN to the similarity judgments, using representational similarity analysis to evaluate model performance. In both DNNs, similarity within the last layer explains most of the explainable variance in human similarity judgments. The last layer outperforms almost all feature-based models. Late and mid-level layers outperform some but not all feature-based models. Importantly, categorical models predict similarity judgments significantly better than any DNN layer. Our results provide further evidence for commonalities between DNNs and brain representations. Models derived from visual features other than object parts perform relatively poorly, perhaps because DNNs more comprehensively capture the colors, textures and contours which matter to human object perception. However, categorical models outperform DNNs, suggesting that further work may be needed to bring high-level semantic representations in DNNs closer to those extracted by humans. Modern DNNs explain similarity judgments remarkably well considering they were not trained on this task, and are promising models for many aspects of human cognition.

14.
J Exp Psychol Hum Percept Perform ; 43(1): 181-191, 2017 01.
Article in English | MEDLINE | ID: mdl-27808549

ABSTRACT

Adaptation to different visual properties can produce distinct patterns of perceptual aftereffect. Some, such as those following adaptation to color, seem to arise from recalibrative processes. These are associated with a reappraisal of which physical input constitutes a normative value in the environment-in this case, what appears "colorless," and what "colorful." Recalibrative aftereffects can arise from coding schemes in which inputs are referenced against malleable norm values. Other aftereffects seem to arise from contrastive processes. These exaggerate differences between the adaptor and other inputs without changing the adaptor's appearance. There has been conjecture over which process best describes adaptation-induced distortions of spatial vision, such as of apparent shape or facial identity. In 3 experiments, we determined whether recalibrative or contrastive processes underlie the shape aspect ratio aftereffect. We found that adapting to a moderately elongated shape compressed the appearance of narrower shapes and further elongated the appearance of more-elongated shapes (Experiment 1). Adaptation did not change the perceived aspect ratio of the adaptor itself (Experiment 2), and adapting to a circle induced similar bidirectional aftereffects on shapes narrower or wider than circular (Experiment 3). Results could not be explained by adaptation to retinotopically local edge orientation or single linear dimensions of shapes. We conclude that aspect ratio aftereffects are determined by contrastive processes that can exaggerate differences between successive inputs, inconsistent with a norm-referenced representation of aspect ratio. Adaptation might enhance the salience of novel stimuli rather than recalibrate one's sense of what constitutes a "normal" shape. (PsycINFO Database Record


Subject(s)
Adaptation, Physiological/physiology , Figural Aftereffect/physiology , Form Perception/physiology , Adult , Humans
15.
Neuron ; 92(2): 280-284, 2016 Oct 19.
Article in English | MEDLINE | ID: mdl-27764662

ABSTRACT

"Grid cells" encode an animal's location and direction of movement in 2D physical environments via regularly repeating receptive fields. Constantinescu et al. (2016) report the first evidence of grid cells for 2D conceptual spaces. The work has exciting implications for mental representation and shows how detailed neural-coding hypotheses can be tested with bulk population-activity measures.


Subject(s)
Neurons , Orientation , Animals , Grid Cells , Movement
17.
J Vis ; 15(8): 1, 2015.
Article in English | MEDLINE | ID: mdl-26030371

ABSTRACT

After looking at a photograph of someone for a protracted period (adaptation), a previously neutral-looking face can take on an opposite appearance in terms of gender, identity, and other attributes-but what happens to the appearance of other faces? Face aftereffects have repeatedly been ascribed to perceptual renormalization. Renormalization predicts that the adapting face and more extreme versions of it should appear more neutral after adaptation (e.g., if the adaptor was male, it and hyper-masculine faces should look more feminine). Other aftereffects, such as tilt and spatial frequency, are locally repulsive, exaggerating differences between adapting and test stimuli. This predicts that the adapting face should be little changed in appearance after adaptation, while more extreme versions of it should look even more extreme (e.g., if the adaptor was male, it should look unchanged, while hyper-masculine faces should look even more masculine). Existing reports do not provide clear evidence for either pattern. We overcame this by using a spatial comparison task to measure the appearance of stimuli presented in differently adapted retinal locations. In behaviorally matched experiments we compared aftereffect patterns after adapting to tilt, facial identity, and facial gender. In all three experiments data matched the predictions of a locally repulsive, but not a renormalizing, aftereffect. These data are consistent with the existence of similar encoding strategies for tilt, facial identity, and facial gender.


Subject(s)
Facial Recognition/physiology , Figural Aftereffect/physiology , Adaptation, Ocular/physiology , Choice Behavior , Female , Humans , Male
18.
Front Psychol ; 6: 157, 2015.
Article in English | MEDLINE | ID: mdl-25745407
19.
J Vis ; 15(1): 15.1.26, 2015 Jan 26.
Article in English | MEDLINE | ID: mdl-25624465

ABSTRACT

Some data have been taken as evidence that after prolonged viewing, near-vertical orientations "normalize" to appear more vertical than they did previously. After almost a century of research, the existence of tilt normalization remains controversial. The most recent evidence for tilt normalization comes from data suggesting a measurable "perceptual drift" of near-vertical adaptors toward vertical, which can be nulled by a slight physical rotation away from vertical (Müller, Schillinger, Do, & Leopold, 2009). We argue that biases in estimates of perceptual stasis could, however, result from the anisotropic organization of orientation-selective neurons in V1, with vertically-selective cells being more narrowly tuned than obliquely-selective cells. We describe a neurophysiologically plausible model that predicts greater sensitivity to orientation displacements toward than away from vertical. We demonstrate the predicted asymmetric pattern of sensitivity in human observers by determining threshold speeds for detecting rotation direction (Experiment 1), and by determining orientation discrimination thresholds for brief static stimuli (Experiment 2). Results imply that data suggesting a perceptual drift toward vertical instead result from greater discrimination sensitivity around cardinal than oblique orientations (the oblique effect), and thus do not constitute evidence for tilt normalization.


Subject(s)
Optical Illusions/physiology , Orientation/physiology , Rotation , Visual Perception/physiology , Anisotropy , Humans , Models, Neurological , Psychophysics , Sensory Thresholds
20.
Iperception ; 6(2): 100-103, 2015 Apr.
Article in English | MEDLINE | ID: mdl-28299168

ABSTRACT

Face aftereffects can help adjudicate between theories of how facial attributes are encoded. O'Neil and colleagues (2014) compared age estimates for faces before and after adapting to young, middle-aged or old faces. They concluded that age aftereffects are best described as a simple re-normalisation-e.g. after adapting to old faces, all faces look younger than they did initially. Here I argue that this conclusion is not substantiated by the reported data. The authors fit only a linear regression model, which captures the predictions of re-normalisation, but not alternative hypotheses such as local repulsion away from the adapted age. A second concern is that the authors analysed absolute age estimates after adaptation, as a function of baseline estimates, so goodness-of-fit measures primarily reflect the physical ages of test faces, rather than the impact of adaptation. When data are re-expressed as aftereffects and fit with a nonlinear "locally repulsive" model, this model performs equal to or better than a linear model in all adaptation conditions. Data in O'Neil et al. do not provide strong evidence for either re-normalisation or local repulsion in facial age aftereffects, but are more consistent with local repulsion (and exemplar-based encoding of facial age), contrary to the original report.

SELECTION OF CITATIONS
SEARCH DETAIL
...