Search | VHL Regional Portal

A Subjective and Objective Study of Space-Time Subsampled Video Quality.

Lee, Dae Yeol; Paul, Somdyuti; Bampis, Christos G; Ko, Hyunsuk; Kim, Jongho; Jeong, Se Yoon; Homan, Blake; Bovik, Alan C.

IEEE Trans Image Process ; 31: 934-948, 2022.

Article in English | MEDLINE | ID: mdl-34965209

ABSTRACT

Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compare the performance of leading video quality models on the new database. The new ETRI-LIVE STSVQ database is being made freely available at (https://live.ece.utexas.edu/research/ETRI-LIVE_STSVQ/index.html).

Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression.

Paul, Somdyuti; Norkin, Andrey; Bovik, Alan C.

IEEE Trans Image Process ; PP2022 Dec 27.

Article in English | MEDLINE | ID: mdl-37015500

ABSTRACT

Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.73% and -1.31% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively, as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder.

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction.

Paul, Somdyuti; Norkin, Andrey; Bovik, Alan C.

IEEE Trans Image Process ; PP2020 Jul 28.

Article in English | MEDLINE | ID: mdl-32746243

ABSTRACT

In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64×64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjøntegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL