Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Image Process ; 32: 3188-3198, 2023.
Article in English | MEDLINE | ID: mdl-37200126

ABSTRACT

In contrast to image compression, the key of video compression is to efficiently exploit the temporal context for reducing the inter-frame redundancy. Existing learned video compression methods generally rely on utilizing short-term temporal correlations or image-oriented codecs, which prevents further improvement of the coding performance. This paper proposed a novel temporal context-based video compression network (TCVC-Net) for improving the performance of learned video compression. Specifically, a global temporal reference aggregation (GTRA) module is proposed to obtain an accurate temporal reference for motion-compensated prediction by aggregating long-term temporal context. Furthermore, in order to efficiently compress the motion vector and residue, a temporal conditional codec (TCC) is proposed to preserve structural and detailed information by exploiting the multi-frequency components in temporal context. Experimental results show that the proposed TCVC-Net outperforms public state-of-the-art methods in terms of both PSNR and MS-SSIM metrics.

2.
IEEE Trans Image Process ; 31: 4515-4526, 2022.
Article in English | MEDLINE | ID: mdl-35727785

ABSTRACT

Multiview video coding (MVC) aims to compress the multiview video through the elimination of video redundancies, where the quality of the reference frame directly affects the compression efficiency. In this paper, we propose a deep virtual reference frame generation method based on a disparity-aware reference frame generation network (DAG-Net) to transform the disparity relationship between different viewpoints and generate a more reliable reference frame. The proposed DAG-Net consists of a multi-level receptive field module, a disparity-aware alignment module, and a fusion reconstruction module. First, a multi-level receptive field module is designed to enlarge the receptive field, and extract the multi-scale deep features of the temporal and inter-view reference frames. Then, a disparity-aware alignment module is proposed to learn the disparity relationship, and perform disparity shift on the inter-view reference frame to align it with the temporal reference frame. Finally, a fusion reconstruction module is utilized to fuse the complementary information and generate a more reliable virtual reference frame. Experiments demonstrate that the proposed reference frame generation method achieves superior performance for multiview video coding.

3.
IEEE Trans Image Process ; 30: 5692-5707, 2021.
Article in English | MEDLINE | ID: mdl-34125681

ABSTRACT

Video surveillance and its applications have become increasingly ubiquitous in modern daily life. In video surveillance system, video coding as a critical enabling technology determines the effective transmission and storage of surveillance videos. In order to meet the real-time or time-critical transmission requirements of video surveillance systems, the low-delay (LD) configuration of the advanced high efficiency video coding (HEVC) standard is usually used to encode surveillance videos. The coding efficiency of the LD configuration is closely related to the quantization parameter (QP) cascading technique which selects or determines the QPs for encoding. However, the quantization parameter cascading (QPC) technique currently adopted for the LD configuration in HEVC test model (i.e., HM) is not optimized since it has not taken full account of the reference dependency in coding. In this paper, an efficient QPC technique for surveillance video coding, referred to as QPC-SV, is proposed, considering all inter reference frames under the LD configuration. Experimental results demonstrate the efficacy of the proposed QPC-SV. Compared with the default configuration of QPC in the HM, the QPC-SV achieves significant rate-distortion performance gain with average BD-rates of -9.35% and -9.76% for the LDP and LDB configurations, respectively.

4.
IEEE Trans Image Process ; 26(4): 1732-1745, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28113341

ABSTRACT

Accurate and high-quality depth maps are required in lots of 3D applications, such as multi-view rendering, 3D reconstruction and 3DTV. However, the resolution of captured depth image is much lower than that of its corresponding color image, which affects its application performance. In this paper, we propose a novel depth map super-resolution (SR) method by taking view synthesis quality into account. The proposed approach mainly includes two technical contributions. First, since the captured low-resolution (LR) depth map may be corrupted by noise and occlusion, we propose a credibility based multi-view depth maps fusion strategy, which considers the view synthesis quality and interview correlation, to refine the LR depth map. Second, we propose a view synthesis quality based trilateral depth-map up-sampling method, which considers depth smoothness, texture similarity and view synthesis quality in the up-sampling filter. Experimental results demonstrate that the proposed method outperforms state-of-the-art depth SR methods for both super-resolved depth maps and synthesized views. Furthermore, the proposed method is robust to noise and achieves promising results under noise-corruption conditions.

5.
Sensors (Basel) ; 13(7): 9223-47, 2013 Jul 17.
Article in English | MEDLINE | ID: mdl-23867746

ABSTRACT

Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.


Subject(s)
Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...