Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
Front Neurosci ; 17: 1222815, 2023.
Article in English | MEDLINE | ID: mdl-37559700

ABSTRACT

The development of automatic methods for image and video quality assessment that correlate well with the perception of human observers is a very challenging open problem in vision science, with numerous practical applications in disciplines such as image processing and computer vision, as well as in the media industry. In the past two decades, the goal of image quality research has been to improve upon classical metrics by developing models that emulate some aspects of the visual system, and while the progress has been considerable, state-of-the-art quality assessment methods still share a number of shortcomings, like their performance dropping considerably when they are tested on a database that is quite different from the one used to train them, or their significant limitations in predicting observer scores for high framerate videos. In this work we propose a novel objective method for image and video quality assessment that is based on the recently introduced Intrinsically Non-linear Receptive Field (INRF) formulation, a neural summation model that has been shown to be better at predicting neural activity and visual perception phenomena than the classical linear receptive field. Here we start by optimizing, on a classic image quality database, the four parameters of a very simple INRF-based metric, and proceed to test this metric on three other databases, showing that its performance equals or surpasses that of the state-of-the-art methods, some of them having millions of parameters. Next, we extend to the temporal domain this INRF image quality metric, and test it on several popular video quality datasets; again, the results of our proposed INRF-based video quality metric are shown to be very competitive.

2.
IEEE Trans Image Process ; 31: 5163-5177, 2022.
Article in English | MEDLINE | ID: mdl-35853056

ABSTRACT

In the quality evaluation of high dynamic range and wide color gamut (HDR/WCG) images, a number of works have concluded that native HDR metrics, such as HDR visual difference predictor (HDR-VDP), HDR video quality metric (HDR-VQM), or convolutional neural network (CNN)-based visibility metrics for HDR content, provide the best results. These metrics consider only the luminance component, but several color difference metrics have been specifically developed for, and validated with, HDR/WCG images. In this paper, we perform subjective evaluation experiments in a professional HDR/WCG production setting, under a real use case scenario. The results are quite relevant in that they show, firstly, that the performance of HDR metrics is worse than that of a classic, simple standard dynamic range (SDR) metric applied directly to the HDR content; and secondly, that the chrominance metrics specifically developed for HDR/WCG imaging have poor correlation with observer scores and are also outperformed by an SDR metric. Based on these findings, we show how a very simple framework for creating color HDR metrics, that uses only luminance SDR metrics, transfer functions, and classic color spaces, is able to consistently outperform, by a considerable margin, state-of-the-art HDR metrics on a varied set of HDR content, for both perceptual quantization (PQ) and Hybrid Log-Gamma (HLG) encoding, luminance and chroma distortions, and on different color spaces of common use.

3.
J Vis ; 22(8): 2, 2022 07 11.
Article in English | MEDLINE | ID: mdl-35833884

ABSTRACT

Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference.


Subject(s)
Illusions , Hand , Humans , Vision, Ocular , Visual Perception
4.
J Vis ; 22(6): 8, 2022 05 03.
Article in English | MEDLINE | ID: mdl-35587354

ABSTRACT

Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the contrast sensitivity functions (CSFs) in light of 1) the current trend of using artificial neural networks for studying vision, and 2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatiotemporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic) adaptation or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with a root mean square error of 11% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science because the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision.


Subject(s)
Contrast Sensitivity , Neural Networks, Computer , Humans , Retina , Vision, Ocular
5.
J Vis ; 21(6): 10, 2021 06 07.
Article in English | MEDLINE | ID: mdl-34144607

ABSTRACT

In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This phenomenon presents a practical challenge for the preservation of the artistic intentions of filmmakers, because it can lead to shifts in image appearance between viewing destinations. In this work, we show that a neural field model based on the efficient representation principle is able to predict induction effects and how, by regularizing its associated energy functional, the model is still able to represent induction but is now invertible. From this finding, we propose a method to preprocess an image in a screen-size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images.


Subject(s)
Motion Pictures , Visual Perception , Humans
6.
IEEE Trans Pattern Anal Mach Intell ; 43(5): 1777-1790, 2021 May.
Article in English | MEDLINE | ID: mdl-31725369

ABSTRACT

Gamut mapping is the problem of transforming the colors of image or video content so as to fully exploit the color palette of the display device where the content will be shown, while preserving the artistic intent of the original content's creator. In particular, in the cinema industry, the rapid advancement in display technologies has created a pressing need to develop automatic and fast gamut mapping algorithms. In this article, we propose a novel framework that is based on vision science models, performs both gamut reduction and gamut extension, is of low computational complexity, produces results that are free from artifacts and outperforms state-of-the-art methods according to psychophysical tests. Our experiments also highlight the limitations of existing objective metrics for the gamut mapping problem.

7.
Sci Rep ; 10(1): 16277, 2020 10 01.
Article in English | MEDLINE | ID: mdl-33004868

ABSTRACT

The responses of visual neurons, as well as visual perception phenomena in general, are highly nonlinear functions of the visual input, while most vision models are grounded on the notion of a linear receptive field (RF). The linear RF has a number of inherent problems: it changes with the input, it presupposes a set of basis functions for the visual system, and it conflicts with recent studies on dendritic computations. Here we propose to model the RF in a nonlinear manner, introducing the intrinsically nonlinear receptive field (INRF). Apart from being more physiologically plausible and embodying the efficient representation principle, the INRF has a key property of wide-ranging implications: for several vision science phenomena where a linear RF must vary with the input in order to predict responses, the INRF can remain constant under different stimuli. We also prove that Artificial Neural Networks with INRF modules instead of linear filters have a remarkably improved performance and better emulate basic human perception. Our results suggest a change of paradigm for vision science as well as for artificial intelligence.


Subject(s)
Visual Perception , Animals , Artificial Intelligence , Humans , Models, Biological , Neural Networks, Computer , Nonlinear Dynamics , Retinal Neurons/physiology , Vision, Ocular/physiology , Visual Perception/physiology
8.
Opt Express ; 28(7): 9327-9339, 2020 Mar 30.
Article in English | MEDLINE | ID: mdl-32225542

ABSTRACT

Images captured under hazy conditions (e.g. fog, air pollution) usually present faded colors and loss of contrast. To improve their visibility, a process called image dehazing can be applied. Some of the most successful image dehazing algorithms are based on image processing methods but do not follow any physical image formation model, which limits their performance. In this paper, we propose a post-processing technique to alleviate this handicap by enforcing the original method to be consistent with a popular physical model for image formation under haze. Our results improve upon those of the original methods qualitatively and according to several metrics, and they have also been validated via psychophysical experiments. These results are particularly striking in terms of avoiding over-saturation and reducing color artifacts, which are the most common shortcomings faced by image dehazing methods.

9.
J Neurophysiol ; 123(6): 2249-2268, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32159407

ABSTRACT

In this paper, we study the communication efficiency of a psychophysically tuned cascade of Wilson-Cowan and divisive normalization layers that simulate the retina-V1 pathway. This is the first analysis of Wilson-Cowan networks in terms of multivariate total correlation. The parameters of the cortical model have been derived through the relation between the steady state of the Wilson-Cowan model and the divisive normalization model. The communication efficiency has been analyzed in two ways: First, we provide an analytical expression for the reduction of the total correlation among the responses of a V1-like population after the application of the Wilson-Cowan interaction. Second, we empirically study the efficiency with visual stimuli and statistical tools that were not available before 1) we use a recent, radiometrically calibrated, set of natural scenes, and 2) we use a recent technique to estimate the multivariate total correlation in bits from sets of visual responses, which only involves univariate operations, thus giving better estimates of the redundancy. The theoretical and the empirical results show that, although this cascade of layers was not optimized for statistical independence in any way, the redundancy between the responses gets substantially reduced along the neural pathway. Specifically, we show that 1) the efficiency of a Wilson-Cowan network is similar to its equivalent divisive normalization model; 2) while initial layers (Von Kries adaptation and Weber-like brightness) contribute to univariate equalization, and the bigger contributions to the reduction in total correlation come from the computation of nonlinear local contrast and the application of local oriented filters; and 3) psychophysically tuned models are more efficient (reduce more total correlation) in the more populated regions of the luminance-contrast plane. These results are an alternative confirmation of the efficient coding hypothesis for the Wilson-Cowan systems, and, from an applied perspective, they suggest that neural field models could be an option in image coding to perform image compression.NEW & NOTEWORTHY The Wilson-Cowan interaction is analyzed in total correlation terms for the first time. Theoretical and empirical results show that this psychophysically tuned interaction achieves the biggest efficiency in the most frequent region of the image space. This is an original confirmation of the efficient coding hypothesis and suggests that neural field models can be an alternative to divisive normalization in image compression.


Subject(s)
Models, Biological , Nerve Net/physiology , Neural Networks, Computer , Retina/physiology , Visual Cortex/physiology , Visual Pathways/physiology , Visual Perception/physiology , Humans
10.
J Neurophysiol ; 123(5): 1606-1618, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32159409

ABSTRACT

We reproduce suprathreshold perception phenomena, specifically visual illusions, by Wilson-Cowan (WC)-type models of neuronal dynamics. Our findings show that the ability to replicate the illusions considered is related to how well the neural activity equations comply with the efficient representation principle. Our first contribution consists in showing that the WC equations can reproduce a number of brightness and orientation-dependent illusions. Then we formally prove that there cannot be an energy functional that the WC dynamics are minimizing. This leads us to consider an alternative, variational modeling, which has been previously employed for local histogram equalization (LHE) tasks. To adapt our model to the architecture of V1, we perform an extension that has an explicit dependence on local image orientation. Finally, we report several numerical experiments showing that LHE provides a better reproduction of visual illusions than the original WC formulation, and that its cortical extension is capable also to reproduce complex orientation-dependent illusions.NEW & NOTEWORTHY We show that the Wilson-Cowan equations can reproduce a number of brightness and orientation-dependent illusions. Then we formally prove that there cannot be an energy functional that the Wilson-Cowan equations are minimizing, making them suboptimal with respect to the efficient representation principle. We thus propose a slight modification that is consistent with such principle and show that this provides a better reproduction of visual illusions than the original Wilson-Cowan formulation. We also consider the cortical extension of both models to deal with more complex orientation-dependent illusions.


Subject(s)
Illusions/physiology , Models, Theoretical , Visual Cortex/physiology , Visual Perception/physiology , Humans
11.
Article in English | MEDLINE | ID: mdl-32011253

ABSTRACT

We present a color matching method that deals with different non-linear encodings. In particular, given two different views of the same scene taken by two cameras with unknown settings and internal parameters, and encoded with unknown non-linear curves, our method is able to correct the colors of one of the images making it look as if it was captured under the other camera's settings. Our method is based on treating the in-camera color processing pipeline as a concatenation of a matrix multiplication on the linear image followed by a non-linearity. This allows us to model a color stabilization transformation among the two shots by estimating a single matrix -that will contain information from both of the original images- and an extra parameter that complies with the non-linearity. The method is fast and the results have no spurious colors. It outperforms the state-of-the-art both visually and according to several metrics, and can handle HDR encodings and very challenging real-life examples.

12.
Front Neurosci ; 13: 8, 2019.
Article in English | MEDLINE | ID: mdl-30894796

ABSTRACT

Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions of different nature and intensity seen on top of a variety of natural images. Data of this kind seems to open a number of possibilities for the vision scientist to check the models in realistic scenarios. However, while these natural databases are great benchmarks for models developed in some other way (e.g., by using the well-controlled artificial stimuli of traditional psychophysics), they should be carefully used when trying to fit vision models. Given the high dimensionality of the image space, it is very likely that some basic phenomena are under-represented in the database. Therefore, a model fitted on these large-scale natural databases will not reproduce these under-represented basic phenomena that could otherwise be easily illustrated with well selected artificial stimuli. In this work we study a specific example of the above statement. A standard cortical model using wavelets and divisive normalization tuned to reproduce subjective opinion on a large image quality dataset fails to reproduce basic cross-masking. Here we outline a solution for this problem by using artificial stimuli and by proposing a modification that makes the model easier to tune. Then, we show that the modified model is still competitive in the large-scale database. Our simulations with these artificial stimuli show that when using steerable wavelets, the conventional unit norm Gaussian kernels in divisive normalization should be multiplied by high-pass filters to reproduce basic trends in masking. Basic visual phenomena may be misrepresented in large natural image datasets but this can be solved with model-interpretable stimuli. This is an additional argument in praise of artifice in line with Rust and Movshon (2005).

13.
J Vis ; 19(2): 13, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30802279

ABSTRACT

The statistics of real world images have been extensively investigated, but in virtually all cases using only low dynamic range image databases. The few studies that have considered high dynamic range (HDR) images have performed statistical analyses categorizing images as HDR according to their creation technique, and not to the actual dynamic range of the underlying scene. In this study we demonstrate, using a recent HDR dataset of natural images, that the statistics of the image as received at the camera sensor change dramatically with dynamic range, with particularly strong correlations with dynamic range being observed for the median, standard deviation, skewness, and kurtosis, while the one over frequency relationship for the power spectrum breaks down for images with a very high dynamic range, in practice making HDR images not scale invariant. Effects are also noted in the derivative statistics, the single pixel histograms, and the Haar wavelet analysis. However, we also show that after some basic early transforms occurring within the eye (light scatter, nonlinear photoreceptor response, center-surround modulation) the statistics of the resulting images become virtually independent from the dynamic range, which would allow them to be processed more efficiently by the human visual system.


Subject(s)
Models, Statistical , Psychomotor Performance/physiology , Visual Perception/physiology , Contrast Sensitivity/physiology , Humans , Signal Processing, Computer-Assisted
14.
J Vis ; 19(1): 16, 2019 01 02.
Article in English | MEDLINE | ID: mdl-30694300

ABSTRACT

In 1986, Paul Whittle investigated the ability to discriminate between the luminance of two small patches viewed upon a uniform background. In 1992, Paul Whittle asked subjects to manipulate the luminance of a number of patches on a uniform background until their brightness appeared to vary from black to white with even steps. The data from the discrimination experiment almost perfectly predicted the gradient of the function obtained in the brightness experiment, indicating that the two experimental methodologies were probing the same underlying mechanism. Whittle introduced a model that was able to capture the pattern of discrimination thresholds and, in turn, the brightness data; however, there were a number of features in the data set that the model couldn't capture. In this paper, we demonstrate that the models of Kane and Bertalmío (2017) and Kingdom and Moulden (1991) may be adapted to predict all the data but only by incorporating an accurate model of detection thresholds. Additionally, we show that a divisive gain model may also capture the data but only by considering polarity-dependent, nonlinear inputs following the underlying pattern of detection thresholds. In summary, we conclude that these models provide a simple link between detection thresholds, discrimination thresholds, and brightness perception.


Subject(s)
Contrast Sensitivity/physiology , Discrimination, Psychological/physiology , Pattern Recognition, Visual/physiology , Sensory Thresholds/physiology , Adult , Humans , Lighting , Models, Theoretical
15.
IEEE Trans Image Process ; 26(4): 1595-1606, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28186888

ABSTRACT

Emerging display technologies are able to produce images with a much wider color gamut than those of conventional distribution gamuts for cinema and TV, creating an opportunity for the development of gamut extension algorithms (GEAs) that exploit the full color potential of these new systems. In this paper, we present a novel GEA, implemented as a PDE-based optimization procedure related to visual perception models, that performs gamut extension (GE) by taking into account the analysis of distortions in hue, chroma, and saturation. User studies performed using a digital cinema projector under cinematic (low ambient light, large screen) conditions show that the proposed algorithm outperforms the state of the art, producing gamut extended images that are perceptually more faithful to the wide-gamut ground truth, as well as free of color artifacts and hue shifts. We also show how currently available image quality metrics, when applied to the GE problem, provide results that do not correlate with users' choices.

16.
PLoS One ; 11(12): e0168963, 2016.
Article in English | MEDLINE | ID: mdl-28030651

ABSTRACT

Retinal lateral inhibition is one of the conventional efficient coding mechanisms in the visual system that is produced by interneurons that pool signals over a neighborhood of presynaptic feedforward cells and send inhibitory signals back to them. Thus, the receptive-field (RF) of a retinal ganglion cell has a center-surround receptive-field (RF) profile that is classically represented as a difference-of-Gaussian (DOG) adequate for efficient spatial contrast coding. The DOG RF profile has been attributed to produce the psychophysical phenomena of brightness induction, in which the perceived brightness of an object is affected by that of its vicinity, either shifting away from it (brightness contrast) or becoming more similar to it (brightness assimilation) depending on the size of the surfaces surrounding the object. While brightness contrast can be modeled using a DOG with a narrow surround, brightness assimilation requires a wide suppressive surround. Early retinal studies determined that the suppressive surround of a retinal ganglion cell is narrow (< 100-300 µm; 'classic RF'), which led researchers to postulate that brightness assimilation must originate at some post-retinal, possibly cortical, stage where long-range interactions are feasible. However, more recent studies have reported that the retinal interneurons also exhibit a spatially wide component (> 500-1000 µm). In the current study, we examine the effect of this wide interneuron RF component in two biophysical retinal models and show that for both of the retinal models it explains the long-range effect evidenced in simultaneous brightness induction phenomena and that the spatial extent of this long-range effect of the retinal model responses matches that of perceptual data. These results suggest that the retinal lateral inhibition mechanism alone can regulate local as well as long-range spatial induction through the narrow and wide RF components of retinal interneurons, arguing against the existing view that spatial induction is operated by two separate local vs. long-range mechanisms.


Subject(s)
Models, Biological , Retina/physiology , Visual Perception/physiology , Animals , Dogs , Humans , Photic Stimulation , Psychophysics , Visual Pathways
17.
J Vis ; 16(6): 4, 2016.
Article in English | MEDLINE | ID: mdl-27271806

ABSTRACT

System gamma is the end-to-end exponent that describes the relationship between the relative luminance values at capture and the reproduced image. The system gamma preferred by subjects is known to vary with the background luminance condition and the image in question. We confirm the previous two findings using an image database with both high and low dynamic range images (from 102 to 107), but also find that the preferred system gamma varies with the dynamic range of the monitor (CRT, LCD, or OLED). We find that the preferred system gamma can be predicted in all conditions and for all images by a simple model that searches for the value that best flattens the lightness distribution, where lightness is modeled as a power law of onscreen luminance. To account for the data, the exponent must vary with the viewing conditions. The method presented allows the inference of lightness perception in natural scenes without direct measurement and makes testable predictions for how lightness perception varies with the viewing condition and the distribution of luminance values in a scene. The data from this paper has been made available online.


Subject(s)
Computer Terminals , Contrast Sensitivity/physiology , Light , Visual Perception/physiology , Humans , Models, Theoretical
18.
IEEE Trans Image Process ; 25(1): 388-99, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26561436

ABSTRACT

In this paper, we consider an image decomposition model that provides a novel framework for image denoising. The model computes the components of the image to be processed in a moving frame that encodes its local geometry (directions of gradients and level lines). Then, the strategy we develop is to denoise the components of the image in the moving frame in order to preserve its local geometry, which would have been more affected if processing the image directly. Experiments on a whole image database tested with several denoising methods show that this framework can provide better results than denoising the image directly, both in terms of Peak signal-to-noise ratio and Structural similarity index metrics.

19.
Sensors (Basel) ; 14(12): 23205-29, 2014 Dec 05.
Article in English | MEDLINE | ID: mdl-25490586

ABSTRACT

Color camera characterization, mapping outputs from the camera sensors to an independent color space, such as XYZ, is an important step in the camera processing pipeline. Until now, this procedure has been primarily solved by using a 3 × 3 matrix obtained via a least-squares optimization. In this paper, we propose to use the spherical sampling method, recently published by Finlayson et al., to perform a perceptual color characterization. In particular, we search for the 3 × 3 matrix that minimizes three different perceptual errors, one pixel based and two spatially based. For the pixel-based case, we minimize the CIE ΔE error, while for the spatial-based case, we minimize both the S-CIELAB error and the CID error measure. Our results demonstrate an improvement of approximately 3% for the ΔE error, 7% for the S-CIELAB error and 13% for the CID error measures.


Subject(s)
Color , Colorimetry/instrumentation , Colorimetry/methods , Image Interpretation, Computer-Assisted/methods , Photography/instrumentation , Algorithms , Equipment Design , Equipment Failure Analysis , Image Interpretation, Computer-Assisted/instrumentation , Reproducibility of Results , Sensitivity and Specificity
20.
Article in English | MEDLINE | ID: mdl-25100983

ABSTRACT

There are many ways in which the human visual system works to reduce the inherent redundancy of the visual information in natural scenes, coding it in an efficient way. The non-linear response curves of photoreceptors and the spatial organization of the receptive fields of visual neurons both work toward this goal of efficient coding. A related, very important aspect is that of the existence of post-retinal mechanisms for contrast enhancement that compensate for the blurring produced in early stages of the visual process. And alongside mechanisms for coding and wiring efficiency, there is neural activity in the human visual cortex that correlates with the perceptual phenomenon of lightness induction. In this paper we propose a neural model that is derived from an image processing technique for histogram equalization, and that is able to deal with all the aspects just mentioned: this new model is able to predict lightness induction phenomena, and improves the efficiency of the representation by flattening both the histogram and the power spectrum of the image signal.

SELECTION OF CITATIONS
SEARCH DETAIL
...