Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Vis Comput Graph ; 29(8): 3642-3655, 2023 Aug.
Article in English | MEDLINE | ID: mdl-35417349

ABSTRACT

3D point clouds have found a wide variety of applications in multimedia processing, remote sensing, and scientific computing. Although most point cloud processing systems are developed to improve viewer experiences, little work has been dedicated to perceptual quality assessment of 3D point clouds. In this work, we build a new 3D point cloud database, namely the Waterloo Point Cloud (WPC) database. In contrast to existing datasets consisting of small-scale and low-quality source content of constrained viewing angles, the WPC database contains 20 high quality, realistic, and omni-directional source point clouds and 740 diversely distorted point clouds. We carry out a subjective quality assessment experiment over the database in a controlled lab environment. Our statistical analysis suggests that existing objective point cloud quality assessment (PCQA) models only achieve limited success in predicting subjective quality ratings. We propose a novel objective PCQA model based on an attention mechanism and a variant of information content-weighted structural similarity, which significantly outperforms existing PCQA models. The database has been made publicly available at https://github.com/qdushl/Waterloo-Point-Cloud-Database.


Subject(s)
Computer Graphics , Multimedia , Databases, Factual , Research Design
2.
Annu Rev Vis Sci ; 7: 437-464, 2021 09 15.
Article in English | MEDLINE | ID: mdl-34348034

ABSTRACT

Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their quality as perceived by human observers. IQA modeling plays a special bridging role between vision science and engineering practice, both as a test-bed for vision theories and computational biovision models and as a powerful tool that could potentially have a profound impact on a broad range of image processing, computer vision, and computer graphics applications for design, optimization, and evaluation purposes. The growth of IQA research has accelerated over the past two decades. In this review, we present an overview of IQA methods from a Bayesian perspective, with the goals of unifying a wide spectrum of IQA approaches under a common framework and providing useful references to fundamental concepts accessible to vision scientists and image processing practitioners. We discuss the implications of the successes and limitations of modern IQA methods for biological vision and the prospect for vision science to inform the design of future artificial vision systems. (The detailed model taxonomy can be found at http://ivc.uwaterloo.ca/research/bayesianIQA/.).


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Bayes Theorem , Humans , Image Processing, Computer-Assisted/methods , Vision, Ocular
3.
Article in English | MEDLINE | ID: mdl-32356747

ABSTRACT

Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space W of the GRD function by analyzing its mathematical properties. We show that W is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of W. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 851-864, 2020 04.
Article in English | MEDLINE | ID: mdl-30596570

ABSTRACT

In many science and engineering fields that require computational models to predict certain physical quantities, we are often faced with the selection of the best model under the constraint that only a small sample set can be physically measured. One such example is the prediction of human perception of visual quality, where sample images live in a high dimensional space with enormous content variations. We propose a new methodology for model comparison named group maximum differentiation (gMAD) competition. Given multiple computational models, gMAD maximizes the chances of falsifying a "defender" model using the rest models as "attackers". It exploits the sample space to find sample pairs that maximally differentiate the attackers while holding the defender fixed. Based on the results of the attacking-defending game, we introduce two measures, aggressiveness and resistance, to summarize the performance of each model at attacking other models and defending attacks from other models, respectively. We demonstrate the gMAD competition using three examples-image quality, image aesthetics, and streaming video quality-of-experience. Although these examples focus on visually discriminable quantities, the gMAD methodology can be extended to many other fields, and is especially useful when the sample space is large, the physical measurement is expensive and the cost of computational prediction is low.

5.
Article in English | MEDLINE | ID: mdl-31751238

ABSTRACT

We propose a fast multi-exposure image fusion (MEF) method, namely MEF-Net, for static image sequences of arbitrary spatial resolution and exposure number. We first feed a low-resolution version of the input sequence to a fully convolutional network for weight map prediction. We then jointly upsample the weight maps using a guided filter. The final image is computed by a weighted fusion. Unlike conventional MEF methods, MEF-Net is trained end-to-end by optimizing the perceptually calibrated MEF structural similarity (MEF-SSIM) index over a database of training sequences at full resolution. Across an independent set of test sequences, we find that the optimized MEF-Net achieves consistent improvement in visual quality for most sequences, and runs 10 to 1000 times faster than state-of-the-art methods. The code is made publicly available at.

6.
Article in English | MEDLINE | ID: mdl-30010561

ABSTRACT

The dynamic adaptive streaming over HTTP (DASH) provides an inter-operable solution to overcome volatile network conditions, but how the human visual quality-ofexperience (QoE) changes with time-varying video quality is not well-understood. Here, we build a large-scale video database of time-varying quality and design a series of subjective experiments to investigate how humans respond to compression level, spatial and temporal resolution adaptations. Our path-analytic results show that quality adaptations influence the QoE by modifying the perceived quality of subsequent video segments. Specifically, the quality deviation introduced by quality adaptations is asymmetric with respect to the adaptation direction, which is further influenced by other factors such as compression level and content. Furthermore, we propose an objective QoE model by integrating the empirical findings from our subjective experiments and the expectation confirmation theory (ECT). Experimental results show that the proposed ECT-QoE model is in close agreement with subjective opinions and significantly outperforms existing QoE models. The video database together with the code are available online at https://ece.uwaterloo.ca/~zduanmu/tip2018ectqoe/.

7.
IEEE Trans Image Process ; 27(3): 1202-1213, 2018 Mar.
Article in English | MEDLINE | ID: mdl-29220321

ABSTRACT

We propose a multi-task end-to-end optimized deep neural network (MEON) for blind image quality assessment (BIQA). MEON consists of two sub-networks-a distortion identification network and a quality prediction network-sharing the early layers. Unlike traditional methods used for training multi-task networks, our training process is performed in two steps. In the first step, we train a distortion type identification sub-network, for which large-scale training samples are readily available. In the second step, starting from the pre-trained early layers and the outputs of the first sub-network, we train a quality prediction sub-network using a variant of the stochastic gradient descent method. Different from most deep neural networks, we choose biologically inspired generalized divisive normalization (GDN) instead of rectified linear unit as the activation function. We empirically demonstrate that GDN is effective at reducing model parameters/layers while achieving similar quality prediction performance. With modest model complexity, the proposed MEON index achieves state-of-the-art performance on four publicly available benchmarks. Moreover, we demonstrate the strong competitiveness of MEON against state-of-the-art BIQA models using the group maximum differentiation competition methodology.

SELECTION OF CITATIONS
SEARCH DETAIL
...