Pesquisa | Portal Regional da BVS

Five points to check when comparing visual perception in humans and machines.

Funke, Christina M; Borowski, Judy; Stosio, Karolina; Brendel, Wieland; Wallis, Thomas S A; Bethge, Matthias.

J Vis ; 21(3): 16, 2021 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-33724362

RESUMO

With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.

Assuntos

Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia , Inteligência Artificial , Humanos , Aprendizagem/fisiologia , Resolução de Problemas , Reconhecimento Psicológico

Image content is more important than Bouma's Law for scene metamers.

Wallis, Thomas Sa; Funke, Christina M; Ecker, Alexander S; Gatys, Leon A; Wichmann, Felix A; Bethge, Matthias.

Elife ; 82019 04 30.

Artigo em Inglês | MEDLINE | ID: mdl-31038458

RESUMO

We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated 'Bouma's Law' of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling.

Assuntos

Reconhecimento Visual de Modelos/fisiologia , Campos Visuais/fisiologia , Percepção Visual/fisiologia , Aglomeração/psicologia , Discriminação Psicológica , Fixação Ocular/fisiologia , Humanos , Mascaramento Perceptivo , Estimulação Luminosa , Percepção Espacial/fisiologia

A parametric texture model based on deep convolutional features closely matches texture appearance for humans.

Wallis, Thomas S A; Funke, Christina M; Ecker, Alexander S; Gatys, Leon A; Wichmann, Felix A; Bethge, Matthias.

J Vis ; 17(12): 5, 2017 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-28983571

RESUMO

Our visual environment is full of texture-"stuff" like cloth, bark, or gravel as distinct from "things" like dresses, trees, or paths-and humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery ("parafoveal") and when observers were able to make eye movements to all three patches ("inspection"). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler Portilla and Simoncelli model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the Portilla and Simoncelli model (nine compared to four textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesize textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.

Assuntos

Movimentos Oculares/fisiologia , Redes Neurais de Computação , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia , Fóvea Central/fisiologia , Humanos , Estimulação Luminosa

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA