Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
Proc Natl Acad Sci U S A ; 120(40): e2211179120, 2023 10 03.
Article in English | MEDLINE | ID: mdl-37769256

ABSTRACT

In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, Trends. Cogn. Sci. 20, 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, 35th International Conference on Machine Learning, ICML 2018 (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.


Subject(s)
Motivation , Visual Perception , Humans , Photic Stimulation/methods , Pattern Recognition, Visual
2.
Proc Natl Acad Sci U S A ; 119(20): e2117184119, 2022 05 17.
Article in English | MEDLINE | ID: mdl-35549552

ABSTRACT

Gaze understanding­a suggested precursor for understanding others' intentions­requires recovery of gaze direction from the observed person's head and eye position. This challenging computation is naturally acquired at infancy without explicit external guidance, but can it be learned later if vision is extremely poor throughout early childhood? We addressed this question by studying gaze following in Ethiopian patients with early bilateral congenital cataracts diagnosed and treated by us only at late childhood. This sight restoration provided a unique opportunity to directly address basic issues on the roles of "nature" and "nurture" in development, as it caused a selective perturbation to the natural process, eliminating some gaze-direction cues while leaving others still available. Following surgery, the patients' visual acuity typically improved substantially, allowing discrimination of pupil position in the eye. Yet, the patients failed to show eye gaze-following effects and fixated less than controls on the eyes­two spontaneous behaviors typically seen in controls. Our model for unsupervised learning of gaze direction explains how head-based gaze following can develop under severe image blur, resembling preoperative conditions. It also suggests why, despite acquiring sufficient resolution to extract eye position, automatic eye gaze following is not established after surgery due to lack of detailed early visual experience. We suggest that visual skills acquired in infancy in an unsupervised manner will be difficult or impossible to acquire when internal guidance is no longer available, even when sufficient image resolution for the task is restored. This creates fundamental barriers to spontaneous vision recovery following prolonged deprivation in early age.


Subject(s)
Fixation, Ocular , Vision, Ocular , Attention , Blindness , Child , Humans , Visual Acuity
4.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Article in English | MEDLINE | ID: mdl-34417308

ABSTRACT

Natural vision is a dynamic and continuous process. Under natural conditions, visual object recognition typically involves continuous interactions between ocular motion and visual contrasts, resulting in dynamic retinal activations. In order to identify the dynamic variables that participate in this process and are relevant for image recognition, we used a set of images that are just above and below the human recognition threshold and whose recognition typically requires >2 s of viewing. We recorded eye movements of participants while attempting to recognize these images within trials lasting 3 s. We then assessed the activation dynamics of retinal ganglion cells resulting from ocular dynamics using a computational model. We found that while the saccadic rate was similar between recognized and unrecognized trials, the fixational ocular speed was significantly larger for unrecognized trials. Interestingly, however, retinal activation level was significantly lower during these unrecognized trials. We used retinal activation patterns and oculomotor parameters of each fixation to train a binary classifier, classifying recognized from unrecognized trials. Only retinal activation patterns could predict recognition, reaching 80% correct classifications on the fourth fixation (on average, ∼2.5 s from trial onset). We thus conclude that the information that is relevant for visual perception is embedded in the dynamic interactions between the oculomotor sequence and the image. Hence, our results suggest that ocular dynamics play an important role in recognition and that understanding the dynamics of retinal activation is crucial for understanding natural vision.


Subject(s)
Fixation, Ocular , Retina/physiology , Visual Perception/physiology , Adult , Female , Humans , Male , Pilot Projects , Saccades , Young Adult
5.
Sci Rep ; 11(1): 7827, 2021 04 09.
Article in English | MEDLINE | ID: mdl-33837223

ABSTRACT

Humans recognize individual faces regardless of variation in the facial view. The view-tuned face neurons in the inferior temporal (IT) cortex are regarded as the neural substrate for view-invariant face recognition. This study approximated visual features encoded by these neurons as combinations of local orientations and colors, originated from natural image fragments. The resultant features reproduced the preference of these neurons to particular facial views. We also found that faces of one identity were separable from the faces of other identities in a space where each axis represented one of these features. These results suggested that view-invariant face representation was established by combining view sensitive visual features. The face representation with these features suggested that, with respect to view-invariant face representation, the seemingly complex and deeply layered ventral visual pathway can be approximated via a shallow network, comprised of layers of low-level processing for local orientations and colors (V1/V2-level) and the layers which detect particular sets of low-level elements derived from natural image fragments (IT-level).


Subject(s)
Facial Recognition/physiology , Recognition, Psychology/physiology , Temporal Lobe/physiology , Visual Cortex/physiology , Visual Pathways/physiology , Animals , Brain Mapping , Face , Macaca fuscata , Nerve Net/physiology , Neurons/physiology
6.
Cognition ; 201: 104263, 2020 08.
Article in English | MEDLINE | ID: mdl-32325309

ABSTRACT

Objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. Human recognition in minimal videos is invariably accompanied by full interpretation of the internal components of the video. State-of-the-art deep convolutional networks for dynamic recognition cannot replicate human behavior in these configurations. The gap between human and machine vision demonstrated here is due to critical mechanisms for full spatiotemporal interpretation that are lacking in current computational models.


Subject(s)
Recognition, Psychology , Vision, Ocular , Humans
7.
J Cogn Neurosci ; 31(9): 1354-1367, 2019 09.
Article in English | MEDLINE | ID: mdl-31059350

ABSTRACT

Visual object recognition is performed effortlessly by humans notwithstanding the fact that it requires a series of complex computations, which are, as yet, not well understood. Here, we tested a novel account of the representations used for visual recognition and their neural correlates using fMRI. The rationale is based on previous research showing that a set of representations, termed "minimal recognizable configurations" (MIRCs), which are computationally derived and have unique psychophysical characteristics, serve as the building blocks of object recognition. We contrasted the BOLD responses elicited by MIRC images, derived from different categories (faces, objects, and places), sub-MIRCs, which are visually similar to MIRCs, but, instead, result in poor recognition and scrambled, unrecognizable images. Stimuli were presented in blocks, and participants indicated yes/no recognition for each image. We confirmed that MIRCs elicited higher recognition performance compared to sub-MIRCs for all three categories. Whereas fMRI activation in early visual cortex for both MIRCs and sub-MIRCs of each category did not differ from that elicited by scrambled images, high-level visual regions exhibited overall greater activation for MIRCs compared to sub-MIRCs or scrambled images. Moreover, MIRCs and sub-MIRCs from each category elicited enhanced activation in corresponding category-selective regions including fusiform face area and occipital face area (faces), lateral occipital cortex (objects), and parahippocampal place area and transverse occipital sulcus (places). These findings reveal the psychological and neural relevance of MIRCs and enable us to make progress in developing a more complete account of object recognition.


Subject(s)
Pattern Recognition, Visual/physiology , Recognition, Psychology/physiology , Visual Cortex/physiology , Adult , Brain/physiology , Brain Mapping , Female , Humans , Magnetic Resonance Imaging , Male , Photic Stimulation , Young Adult
8.
Sci Adv ; 5(3): eaav1598, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30944855

ABSTRACT

Patterns are broad phenomena that relate to biology, chemistry, and physics. The dendritic growth of crystals is the most well-known ice pattern formation process. Tyndall figures are water-melting patterns that occur when ice absorbs light and becomes superheated. Here, we report a previously undescribed ice and water pattern formation process induced by near-infrared irradiation that heats one phase more than the other in a two-phase system. The pattern formed during the irradiation of ice crystals tens of micrometers thick in solution near equilibrium. Dynamic holes and a microchannel labyrinth then formed in specific regions and were characterized by a typical distance between melted points. We concluded that the differential absorption of water and ice was the driving force for the pattern formation. Heating ice by laser absorption might be useful in applications such as the cryopreservation of biological samples.

10.
Cognition ; 183: 67-81, 2019 02.
Article in English | MEDLINE | ID: mdl-30419508

ABSTRACT

Rapid developments in the fields of learning and object recognition have been obtained by successfully developing and using methods for learning from a large number of labeled image examples. However, such current methods cannot explain infants' learning of new concepts based on their visual experience, in particular, the ability to learn complex concepts without external guidance, as well as the natural order in which related concepts are acquired. A remarkable example of early visual learning is the category of 'containers' and the notion of 'containment'. Surprisingly, this is one of the earliest spatial relations to be learned, starting already around 3 month of age, and preceding other common relations (e.g., 'support', 'in-between'). In this work we present a model, which explains infants' capacity of learning 'containment' and related concepts by 'just looking', together with their empirical development trajectory. Learning occurs in the model fast and without external guidance, relying only on perceptual processes that are present in the first months of life. Instead of labeled training examples, the system provides its own internal supervision to guide the learning process. We show how the detection of so-called 'paradoxical occlusion' provides natural internal supervision, which guides the system to gradually acquire a range of useful containment-related concepts. Similar mechanisms of using implicit internal supervision can have broad application in other cognitive domains as well as artificial intelligent systems, because they alleviate the need for supplying extensive external supervision, and because they can guide the learning process to extract concepts that are meaningful to the observer, even if they are not by themselves obvious, or salient in the input.


Subject(s)
Child Development/physiology , Learning/physiology , Models, Theoretical , Space Perception/physiology , Visual Perception/physiology , Humans , Infant
11.
PLoS One ; 13(9): e0201192, 2018.
Article in English | MEDLINE | ID: mdl-30235218

ABSTRACT

Despite a large body of research on response properties of neurons in the inferior temporal (IT) cortex, studies to date have not yet produced quantitative feature descriptions that can predict responses to arbitrary objects. This deficit in the research prevents a thorough understanding of object representation in the IT cortex. Here we propose a fragment-based approach for finding quantitative feature descriptions of face neurons in the IT cortex. The development of the proposed method was driven by the assumption that it is possible to recover features from a set of natural image fragments if the set is sufficiently large. To find the feature from the set, we compared object responses predicted from each fragment and responses of neurons to these objects, and search for the fragment that revealed the highest correlation with neural object responses. Prediction of object responses of each fragment was made by normalizing Euclidian distance between the fragment and each object to 0 to 1 such that the smaller distance gives the higher value. The distance was calculated at the space where images were transformed to a local orientation space by a Gabor filter and a local max operation. The method allowed us to find features with a correlation coefficient between predicted and neural responses of 0.68 on average (number of object stimuli, 104) from among 560,000 feature candidates, reliably explaining differential responses among faces as well as a general preference for faces over to non-face objects. Furthermore, predicted responses of the resulting features to novel object images were significantly correlated with neural responses to these images. Identification of features comprising specific, moderately complex combinations of local orientations and colors enabled us to predict responses to upright and inverted faces, which provided a possible mechanism of face inversion effects. (292/300).


Subject(s)
Neurons/cytology , Neurons/physiology , Temporal Lobe/cytology , Temporal Lobe/physiology , Visual Perception/physiology , Animals , Macaca mulatta , Male
12.
Interface Focus ; 8(4): 20180020, 2018 Aug 06.
Article in English | MEDLINE | ID: mdl-29951197

ABSTRACT

Computational models of vision have advanced in recent years at a rapid rate, rivalling in some areas human-level performance. Much of the progress to date has focused on analysing the visual scene at the object level-the recognition and localization of objects in the scene. Human understanding of images reaches a richer and deeper image understanding both 'below' the object level, such as identifying and localizing object parts and sub-parts, as well as 'above' the object level, such as identifying object relations, and agents with their actions and interactions. In both cases, understanding depends on recovering meaningful structures in the image, and their components, properties and inter-relations, a process referred here as 'image interpretation'. In this paper, we describe recent directions, based on human and computer vision studies, towards human-like image interpretation, beyond the reach of current schemes, both below the object level, as well as some aspects of image interpretation at the level of meaningful configurations beyond the recognition of individual objects, and in particular, interactions between two people in close contact. In both cases the recognition process depends on the detailed interpretation of so-called 'minimal images', and at both levels recognition depends on combining 'bottom-up' processing, proceeding from low to higher levels of a processing hierarchy, together with 'top-down' processing, proceeding from high to lower levels stages of visual analysis.

13.
Cognition ; 171: 65-84, 2018 02.
Article in English | MEDLINE | ID: mdl-29107889

ABSTRACT

The goal in this work is to model the process of 'full interpretation' of object images, which is the ability to identify and localize all semantic features and parts that are recognized by human observers. The task is approached by dividing the interpretation of the complete object to the interpretation of multiple reduced but interpretable local regions. In such reduced regions, interpretation is simpler, since the number of semantic components is small, and the variability of possible configurations is low. We model the interpretation process by identifying primitive components and relations that play a useful role in local interpretation by humans. To identify useful components and relations used in the interpretation process, we consider the interpretation of 'minimal configurations': these are reduced local regions, which are minimal in the sense that further reduction renders them unrecognizable and uninterpretable. We show that such minimal interpretable images have useful properties, which we use to identify informative features and relations used for full interpretation. We describe our interpretation model, and show results of detailed interpretations of minimal configurations, produced automatically by the model. Finally, we discuss possible extensions and implications of full interpretation to difficult visual tasks, such as recognizing social interactions, which are beyond the scope of current models of visual recognition.


Subject(s)
Models, Theoretical , Pattern Recognition, Automated , Pattern Recognition, Visual/physiology , Humans
14.
Proc Natl Acad Sci U S A ; 113(10): 2744-9, 2016 Mar 08.
Article in English | MEDLINE | ID: mdl-26884200

ABSTRACT

Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.


Subject(s)
Neural Networks, Computer , Pattern Recognition, Visual/physiology , Vision, Ocular/physiology , Visual Perception/physiology , Brain/physiology , Humans , Models, Neurological , Nerve Net/physiology , Photic Stimulation , Psychophysics/methods , Visual Cortex/physiology , Visual Pathways/physiology
15.
J Comp Neurol ; 522(1): 225-59, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-23983048

ABSTRACT

The laminar location of the cell bodies and terminals of interareal connections determines the hierarchical structural organization of the cortex and has been intensively studied. However, we still have only a rudimentary understanding of the connectional principles of feedforward (FF) and feedback (FB) pathways. Quantitative analysis of retrograde tracers was used to extend the notion that the laminar distribution of neurons interconnecting visual areas provides an index of hierarchical distance (percentage of supragranular labeled neurons [SLN]). We show that: 1) SLN values constrain models of cortical hierarchy, revealing previously unsuspected areal relations; 2) SLN reflects the operation of a combinatorial distance rule acting differentially on sets of connections between areas; 3) Supragranular layers contain highly segregated bottom-up and top-down streams, both of which exhibit point-to-point connectivity. This contrasts with the infragranular layers, which contain diffuse bottom-up and top-down streams; 4) Cell filling of the parent neurons of FF and FB pathways provides further evidence of compartmentalization; 5) FF pathways have higher weights, cross fewer hierarchical levels, and are less numerous than FB pathways. Taken together, the present results suggest that cortical hierarchies are built from supra- and infragranular counterstreams. This compartmentalized dual counterstream organization allows point-to-point connectivity in both bottom-up and top-down directions.


Subject(s)
Neurons/cytology , Visual Cortex/anatomy & histology , Visual Pathways/anatomy & histology , Animals , Feedback, Sensory , Female , Macaca fascicularis , Macaca mulatta , Male , Neuroanatomical Tract-Tracing Techniques , Visual Cortex/cytology , Visual Pathways/cytology
16.
Ann N Y Acad Sci ; 1305: 72-82, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23773126

ABSTRACT

Object recognition has been a central yet elusive goal of computational vision. For many years, computer performance seemed highly deficient and unable to emulate the basic capabilities of the human recognition system. Over the past decade or so, computer scientists and neuroscientists have developed algorithms and systems-and models of visual cortex-that have come much closer to human performance in visual identification and categorization. In this personal perspective, we discuss the ongoing struggle of visual models to catch up with the visual cortex, identify key reasons for the relatively rapid improvement of artificial systems and models, and identify open problems for computational vision in this domain.


Subject(s)
Vision, Ocular/physiology , Visual Perception/physiology , Computer Simulation , Humans , Models, Neurological , Visual Cortex/physiology
17.
Proc Natl Acad Sci U S A ; 109(44): 18215-20, 2012 Oct 30.
Article in English | MEDLINE | ID: mdl-23012418

ABSTRACT

Early in development, infants learn to solve visual problems that are highly challenging for current computational methods. We present a model that deals with two fundamental problems in which the gap between computational difficulty and infant learning is particularly striking: learning to recognize hands and learning to recognize gaze direction. The model is shown a stream of natural videos and learns without any supervision to detect human hands by appearance and by context, as well as direction of gaze, in complex natural scenes. The algorithm is guided by an empirically motivated innate mechanism--the detection of "mover" events in dynamic images, which are the events of a moving image region causing a stationary region to move or change after contact. Mover events provide an internal teaching signal, which is shown to be more effective than alternative cues and sufficient for the efficient acquisition of hand and gaze representations. The implications go beyond the specific tasks, by showing how domain-specific "proto concepts" can guide the system to acquire meaningful concepts, which are significant to the observer but statistically inconspicuous in the sensory input.


Subject(s)
Visual Perception , Hand , Humans , Task Performance and Analysis
18.
Perception ; 41(9): 1013-6, 2012.
Article in English | MEDLINE | ID: mdl-23409365
19.
J Vis ; 11(8): 18, 2011 Jul 28.
Article in English | MEDLINE | ID: mdl-21799022

ABSTRACT

Visual expertise is usually defined as the superior ability to distinguish between exemplars of a homogeneous category. Here, we ask how real-world expertise manifests at basic-level categorization and assess the contribution of stimulus-driven and top-down knowledge-based factors to this manifestation. Car experts and novices categorized computer-selected image fragments of cars, airplanes, and faces. Within each category, the fragments varied in their mutual information (MI), an objective quantifiable measure of feature diagnosticity. Categorization of face and airplane fragments was similar within and between groups, showing better performance with increasing MI levels. Novices categorized car fragments more slowly than face and airplane fragments, while experts categorized car fragments as fast as face and airplane fragments. The experts' advantage with car fragments was similar across MI levels, with similar functions relating RT with MI level for both groups. Accuracy was equal between groups for cars as well as faces and airplanes, but experts' response criteria were biased toward cars. These findings suggest that expertise does not entail only specific perceptual strategies. Rather, at the basic level, expertise manifests as a general processing advantage arguably involving application of top-down mechanisms, such as knowledge and attention, which helps experts to distinguish between object categories.


Subject(s)
Choice Behavior , Discrimination, Psychological/physiology , Learning/physiology , Recognition, Psychology/physiology , Visual Perception/physiology , Face , Humans , Photic Stimulation/methods
20.
Neural Comput ; 21(11): 3010-56, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19686065

ABSTRACT

In this letter, we develop and simulate a large-scale network of spiking neurons that approximates the inference computations performed by graphical models. Unlike previous related schemes, which used sum and product operations in either the log or linear domains, the current model uses an inference scheme based on the sum and maximization operations in the log domain. Simulations show that using these operations, a large-scale circuit, which combines populations of spiking neurons as basic building blocks, is capable of finding close approximations to the full mathematical computations performed by graphical models within a few hundred milliseconds. The circuit is general in the sense that it can be wired for any graph structure, it supports multistate variables, and it uses standard leaky integrate-and-fire neuronal units. Following previous work, which proposed relations between graphical models and the large-scale cortical anatomy, we focus on the cortical microcircuitry and propose how anatomical and physiological aspects of the local circuitry may map onto elements of the graphical model implementation. We discuss in particular the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large layer 2/3 basket cells, and double-bouquet neurons), subpopulations of strongly interconnected neurons with their unique connectivity patterns in different cortical layers, and the possible role of minicolumns in the realization of the population-based maximum operation.


Subject(s)
Cerebral Cortex/physiology , Models, Neurological , Neural Networks, Computer , Algorithms , Brain Mapping , Computer Graphics , Data Interpretation, Statistical , Markov Chains , Models, Statistical , Neural Pathways
SELECTION OF CITATIONS
SEARCH DETAIL
...