Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Image Process ; 32: 2063-2076, 2023.
Article in English | MEDLINE | ID: mdl-37023144

ABSTRACT

Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions within one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately and efficiently, given the same complexity. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak, Tecnick-100 and Tecnick-40 datasets show that the proposed scheme outperforms all the leading learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The source code is available at https://github.com/fengyurenpingsheng.

2.
Article in English | MEDLINE | ID: mdl-32784138

ABSTRACT

This paper proposes a novel bi-directional motion compensation framework that extracts existing motion information associated with the reference frames and interpolates an additional reference frame candidate that is co-located with the current frame. The approach generates a dense motion field by performing optical flow estimation, so as to capture complex motion between the reference frames without recourse to additional side information. The estimated optical flow is then complemented by transmission of offset motion vectors to correct for possible deviation from the linearity assumption in the interpolation. Various optimization schemes specifically tailored to the video coding framework are presented to further improve the performance. To accommodate applications where decoder complexity is a cardinal concern, a block-constrained speed-up algorithm is also proposed. Experimental results show that the main approach and optimization methods yield significant coding gains across a diverse set of video sequences. Further experiments focus on the trade-off between performance and complexity, and demonstrate that the proposed speed-up algorithm offers complexity reduction by a large factor while maintaining most of the performance gains.

3.
IEEE Trans Image Process ; 23(8): 3684-97, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24956371

ABSTRACT

This paper focuses on prediction optimality in spatially scalable video coding. It draws inspiration from an estimation-theoretic prediction framework for quality (SNR) scalability earlier developed by our group, which achieved optimality by fully accounting for relevant information from the current base layer (e.g., quantization intervals) and the enhancement layer, to efficiently calculate the conditional expectation that forms the optimal predictor. It was central to that approach that all layers reconstruct approximations to the same original transform coefficient. In spatial scalability, however, the layers encode different resolution versions of the signal. To approach optimality in enhancement layer prediction, this paper departs from existing spatially scalable codecs that employ pixel domain resampling to perform interlayer prediction. Instead, it incorporates a transform domain resampling technique that ensures that the base layer quantization intervals are accessible and usable at the enhancement layer despite their differing signal resolutions, which in conjunction with prior enhancement layer information, enable optimal prediction. A delayed prediction approach that complements this framework for spatial scalable video coding is then provided to further exploit future base layer frames for additional enhancement layer coding performance gains. Finally, a low-complexity variant of the proposed estimation-theoretic prediction approach is also devised, which approximates the conditional expectation by switching between three predictors depending on a simple condition involving information from both layers, and which retains significant performance gains. Simulations provide experimental evidence that the proposed approaches substantially outperform the standard scalable video codec and other leading competitors.


Subject(s)
Data Compression/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photography/methods , Signal Processing, Computer-Assisted , Video Recording/methods , Algorithms , Reproducibility of Results , Sensitivity and Specificity
4.
IEEE Trans Image Process ; 22(3): 1175-85, 2013 Mar.
Article in English | MEDLINE | ID: mdl-23193453

ABSTRACT

Current video coders employ predictive coding with motion compensation to exploit temporal redundancies in the signal. In particular, blocks along a motion trajectory are modeled as an auto-regressive (AR) process, and it is generally assumed that the prediction errors are temporally independent and approximate the innovations of this process. Thus, zero-delay encoding and decoding is considered efficient. This paper is premised on the largely ignored fact that these prediction errors are, in fact, temporally dependent due to quantization effects in the prediction loop. It presents an estimation-theoretic delayed decoding scheme, which exploits information from future frames to improve the reconstruction quality of the current frame. In contrast to the standard decoder that reproduces every block instantaneously once the corresponding quantization indices of residues are available, the proposed delayed decoder efficiently combines all accessible (including any future) information in an appropriately derived probability density function, to obtain the optimal delayed reconstruction per transform coefficient. Experiments demonstrate significant gains over the standard decoder. Requisite information about the source AR model is estimated in a spatio-temporally adaptive manner from a bit-stream conforming to the H.264/AVC standard, i.e., no side information needs to be sent to the decoder in order to employ the proposed approach, thereby compatibility with the standard syntax and existing encoders is retained.


Subject(s)
Algorithms , Artifacts , Data Compression/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Video Recording/methods , Reproducibility of Results , Sensitivity and Specificity , Subtraction Technique
5.
IEEE Trans Image Process ; 21(4): 1874-84, 2012 Apr.
Article in English | MEDLINE | ID: mdl-21965209

ABSTRACT

This paper proposes a novel approach to jointly optimize spatial prediction and the choice of the subsequent transform in video and image compression. Under the assumption of a separable first-order Gauss-Markov model for the image signal, it is shown that the optimal Karhunen-Loeve Transform, given available partial boundary information, is well approximated by a close relative of the discrete sine transform (DST), with basis vectors that tend to vanish at the known boundary and maximize energy at the unknown boundary. The overall intraframe coding scheme thus switches between this variant of the DST named asymmetric DST (ADST), and traditional discrete cosine transform (DCT), depending on prediction direction and boundary information. The ADST is first compared with DCT in terms of coding gain under ideal model conditions and is demonstrated to provide significantly improved compression efficiency. The proposed adaptive prediction and transform scheme is then implemented within the H.264/AVC intra-mode framework and is experimentally shown to significantly outperform the standard intra coding mode. As an added benefit, it achieves substantial reduction in blocking artifacts due to the fact that the transform now adapts to the statistics of block edges. An integer version of this ADST is also proposed.


Subject(s)
Algorithms , Data Compression/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photography/methods , Signal Processing, Computer-Assisted , Video Recording/methods , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...