Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38662566

ABSTRACT

Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is derived from machine perceptual differences between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and create a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep feature differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality. Extensive experiments demonstrate that SMR models significantly improve compression performance for machines and exhibit robust generalizability on unseen machines, codecs, datasets, and frame types. Code is available at https://github.com/ywwynm/SMR.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5820-5834, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38386571

ABSTRACT

To cost-effectively transmit high-quality dynamic 3D human images in immersive multimedia applications, efficient data compression is crucial. Unlike existing methods that focus on reducing signal-level reconstruction errors, we propose the first dynamic 3D human compression framework based on human priors. The layered coding architecture significantly enhances the perceptual quality while also supporting a variety of downstream tasks, including visual analysis and content editing. Specifically, a high-fidelity pose-driven Avatar is generated from the original frames as the basic structure layer to implicitly represent the human shape. Then, human movements between frames are parameterized via a commonly-used human prior model, i.e., the Skinned Multi-Person Linear Model (SMPL), to form the motion layer and drive the Avatar. Furthermore, the normals are also introduced as an enhancement layer to preserve fine-grained geometric details. Finally, the Avatar, SMPL parameters, and normal maps are efficiently compressed into layered semantic bitstreams. Extensive qualitative and quantitative experiments show that the proposed framework remarkably outperforms other state-of-the-art 3D codecs in terms of subjective quality with only a few bits. More notably, as the size or frame number of the 3D human sequence increases, the superiority of our framework in perceptual quality becomes more significant while saving more bitrates.


Subject(s)
Data Compression , Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Data Compression/methods , Algorithms , Posture/physiology
3.
IEEE Trans Image Process ; 33: 408-422, 2024.
Article in English | MEDLINE | ID: mdl-38133987

ABSTRACT

The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm's feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates (< 0.01 bpp), offering new insights for human-machine collaborative compression.


Subject(s)
Data Compression , Humans , Data Compression/methods , Signal Processing, Computer-Assisted , Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Video Recording/methods
4.
IEEE Trans Image Process ; 32: 6020-6031, 2023.
Article in English | MEDLINE | ID: mdl-37910424

ABSTRACT

In this paper, we present the first attempt at determining where the achievable rate-distortion (R-D) performance bound in versatile video coding (VVC) intra coding is when considering the mutual dependency in the rate-distortion optimization (RDO) process. In particular, the abundant search space of encoding parameters in VVC intra coding is practically explored with a beam search-based joint rate-distortion optimization (BSJRDO) scheme. As such, the partitioning, prediction and transform decisions are jointly optimized across different coding units (CUs) with a customized search subset instead of the full space. To make the beam search process implementation-friendly for VVC, the dependencies among the CUs are truncated at different depths. To facilitate finer computational scalability, the beam size is flexibly adjusted based on the characteristics of the CUs, such that the operational points that satisfy different complexity demands for diverse applications can be practically obtained. The proposed BSJRDO approach, which fully conforms to the VVC decoding syntax, can serve as both the way toward the optimal RDO bound and a practical performance-boosting solution. BSJRDO is further implemented on a VVC coding platform (VVC Test model (VTM) 12.0), and extensive experiments show that BSJRDO can achieve 1.30% and 3.22% bit rate savings compared to the VTM anchor under the common test condition and low-bit-rate coding scenarios, respectively. Moreover, the performance gain can also be flexibly customized with different computational overheads.

5.
IEEE Trans Image Process ; 32: 5478-5493, 2023.
Article in English | MEDLINE | ID: mdl-37782618

ABSTRACT

Learned image compression methods have achieved satisfactory results in recent years. However, existing methods are typically designed for RGB format, which are not suitable for YUV420 format due to the variance of different formats. In this paper, we propose an information-guided compression framework using cross-component attention mechanism, which can achieve efficient image compression in YUV420 format. Specifically, we design a dual-branch advanced information-preserving module (AIPM) based on the information-guided unit (IGU) and attention mechanism. On the one hand, the dual-branch architecture can prevent changes in original data distribution and avoid information disturbance between different components. The feature attention block (FAB) can preserve the important information. On the other hand, IGU can efficiently utilize the correlations between Y and UV components, which can further preserve the information of UV by the guidance of Y. Furthermore, we design an adaptive cross-channel enhancement module (ACEM) to reconstruct the details by utilizing the relations from different components, which makes use of the reconstructed Y as the textural and structural guidance for UV components. Extensive experiments show that the proposed framework can achieve the state-of-the-art performance in image compression for YUV420 format. More importantly, the proposed framework outperforms Versatile Video Coding (VVC) with 8.37% BD-rate reduction on common test conditions (CTC) sequences on average. In addition, we propose a quantization scheme for context model without model retraining, which can overcome the cross-platform decoding error caused by the floating-point operations in context model and provide a reference approach for the application of neural codec on different platforms.

6.
IEEE Trans Image Process ; 31: 7222-7236, 2022.
Article in English | MEDLINE | ID: mdl-36374881

ABSTRACT

In-loop filters have attracted increasing attention due to the remarkable noise-reduction capability in the hybrid video coding framework. However, the existing in-loop filters in Versatile Video Coding (VVC) mainly take advantage of the image local similarity. Although some non-local based in-loop filters can make up for this shortcoming, the widely-used unsupervised parameter estimation method by non-local filters limits the performance. In view of this, we propose a deformable Wiener Filter (DWF). It combines the local and non-local characteristics and supervisedly trains the filter coefficients based on the Wiener Filter theory. In the filtering process, local adjacent samples and non-local similar samples are first derived for each sample of interest. Then the to-be-filtered samples are classified into specific groups based on the patch-level noise and sample-level characteristics. Samples in each group share the same filter coefficients. After that, the local and non-local reference samples are adaptively fused based on the classification results. Finally, the filtering operation with outlier data constraints is conducted for each to-be-filtered sample. Moreover, the performance of the proposed DWF is analyzed with different reference sample derivation schemes in detail. Simulation results show that the proposed approach achieves 1.16%, 1.92%, and 2.67% bit-rate savings on average compared to the VTM-11.0 for All Intra, Random Access, and Low-Delay B configurations, respectively.

7.
Article in English | MEDLINE | ID: mdl-35930518

ABSTRACT

As a highly ill-posed issue, single-image super-resolution (SISR) has been widely investigated in recent years. The main task of SISR is to recover the information loss caused by the degradation procedure. According to the Nyquist sampling theory, the degradation leads to the aliasing effect and makes it hard to restore the correct textures from low-resolution (LR) images. In practice, there are correlations and self-similarities among the adjacent patches in the natural images. This article considers the self-similarity and proposes a hierarchical image super-resolution network (HSRNet) to suppress the influence of aliasing. We consider the SISR issue in the optimization perspective and propose an iterative solution pattern based on the half-quadratic splitting (HQS) method. To explore the texture with local image prior, we design a hierarchical exploration block (HEB) and progressive increase the receptive field. Furthermore, multilevel spatial attention (MSA) is devised to obtain the relations of adjacent feature and enhance the high-frequency information, which acts as a crucial role for visual experience. The experimental result shows that HSRNet achieves better quantitative and visual performance than other works and remits the aliasing more effectively.

8.
Proc Natl Acad Sci U S A ; 119(34): e2114680119, 2022 Aug 23.
Article in English | MEDLINE | ID: mdl-35972958

ABSTRACT

This study describes and demonstrates key steps in a carbon-negative process for manufacturing cement from widely abundant seawater-derived magnesium (Mg) feedstocks. In contrast to conventional Portland cement, which starts with carbon-containing limestone as the source material, the proposed process uses membrane-free electrolyzers to facilitate the conversion of carbon-free magnesium ions (Mg2+) in seawater into magnesium hydroxide [Mg(OH)2] precursors for the production of Mg-based cement. After a low-temperature carbonation curing step converts Mg(OH)2 into magnesium carbonates through reaction with carbon dioxide (CO2), the resulting Mg-based binders can exhibit compressive strength comparable to that achieved by Portland cement after curing for only 2 days. Although the proposed "cement-from-seawater" process requires similar energy use per ton of cement as existing processes and is not currently suitable for use in conventional reinforced concrete, its potential to achieve a carbon-negative footprint makes it highly attractive to help decarbonize one of the most carbon-intensive industries.

9.
Article in English | MEDLINE | ID: mdl-35802547

ABSTRACT

Traditional neural network compression (NNC) methods decrease the model size and floating-point operations (FLOPs) in the manner of screening out unimportant weight parameters; however, the intrinsic sparsity characteristics have not been fully exploited. In this article, from the perspective of signal processing and analysis for network parameters, we propose to use a compressive sensing (CS)-based method, namely NNCS, for performance improvements. Our proposed NNCS is inspired by the discovery that sparsity levels of weight parameters in the transform domain are greater than those in the original domain. First, to achieve sparse representations for parameters in the transform domain during training, we incorporate a constrained CS model into loss function. Second, the proposed effective training process consists of two steps, where the first step trains raw weight parameters and induces and reconstructs their sparse representations and the second step trains transform coefficients to improve network performances. Finally, we transform the entire neural network into another new domain-based representation, and a sparser parameter distribution can be obtained to facilitate inference acceleration. Experimental results demonstrate that NNCS can significantly outperform the other existing state-of-the-art methods in terms of parameter reductions and FLOPs. With VGGNet on CIFAR-10, we decrease 94.8% parameters and achieve a 76.8% reduction of FLOPs, with 0.13% drop in Top-1 accuracy. With ResNet-50 on ImageNet, we decrease 75.6% parameters and achieve a 78.9% reduction of FLOPs, with 1.24% drop in Top-1 accuracy.

10.
Hum Factors ; : 187208221115497, 2022 Jul 20.
Article in English | MEDLINE | ID: mdl-35856179

ABSTRACT

OBJECTIVE: This study aims to evaluate the effect of in-vehicle audio warning at flashing-light-controlled grade crossings based on driving simulation and eye-tracking systems. BACKGROUND: Collisions at flashing-light-controlled grade crossings have severe consequences. In-vehicle audio warning has the potential to regulate driver behavior. However, whether this improvement occurs through priming drivers' visual search patterns is not yet clear. METHOD: Drivers' visual activity and behaviors were recorded. The effect of a warning was tested with a series of flashing light trigger times (FLTTs) ranging from 2s to 6s with a 1s increment. Different driving conditions (i.e., clear and fog) and driver experience were considered in the experiment design. RESULTS: Warnings could guide the allocation of both overt and covert attention, as well as raise drivers' situation awareness, manifesting as the enhanced perception of signs and better understanding of the flashing red light. Significant improvement in the stop-compliance rate was found in warning scenarios, particularly with a late FLTT. The decreased saccade duration and increased fixation duration on the signal implied a dilemma-zone effect when the FLTT was lower than 4s. Furthermore, reduced fixation duration on signs and signals was found in foggy conditions. Non-professional drivers had a wider search range than their counterparts. CONCLUSION: In-vehicle audio warning is an effective countermeasure for improving crossing safety by optimizing visual search strategy. APPLICATION: In-vehicle audio warnings warrant promotion at grade crossings based on the driver assistance system.

11.
Accid Anal Prev ; 172: 106693, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35552119

ABSTRACT

Train-vehicle collisions at STOP-sign-controlled grade crossings attract many concerns in China and across the world. Researchers have demonstrated that the cost-effective approaches to improve grade crossing safety are the redesign of signs and pavement markings as well as the application of in-vehicle audio warning. However, the impacts of improved signs design and audio warning on drivers' visual performance have barely been discussed. This study explored the effects of improved signs design and audio warning on drivers' eye movement patterns and driving behavior at STOP-sign-controlled grade crossings, by conducting a driving simulator experiment. Three types of grade crossing scenarios: 1) the conventional signs design (Baseline), 2) improved signs design (PS), and 3) improved signs design and three-stage audio warning (PSW), were modeled in a driving simulation system and tested under a series of train TTC (no train, 4 s, 7 s, 10 s, 13 s) conditions. Foggy conditions and drivers' characteristics, i.e., gender and vocation were considered in the experiment design. Seven variables describing both drivers' fixation patterns and driving performance were collected and analyzed in this study, e.g., total fixation duration, distance to stop line at the first fixation, fixation transition probability, stop compliance, speed, maximum deceleration rate and minimum time-to-collision. Results revealed that the improved design of signs and the audio warning could prime drivers' expectation of the grade crossing in advance since drivers could drive at a lower speed, perceive signs timely, and conduct an earlier visual search for the train with these countermeasures. Besides, in PS and PSW scenarios, drivers attached more importance to the STOP sign, and they were more cautious in estimating the time-to-arrival of the train by repeatedly fixating on these two areas. The improvement in fixation performance of drivers in PS and PSW contributed to a more comfortable deceleration. Compared with no warning scenarios, higher compliance rates were observed with audio warning, especially with a short train TTC (4 s and 7 s). However, no significant difference was found between PS and Baseline, indicating the limited safety benefits of improved signs design. Minimum time-to-collision for those drivers who ignored the warning did not increase significantly in both PS and PSW. Additionally, heavy fog limited drivers' perception of signs and led to a later and shorter fixation. For gender effect, males had a lower fixation duration on the STOP sign and lower compliance rate than females. Moreover, female drivers could perceive the approaching train earlier than males, especially in PS and PSW. These findings suggested that the improved signs design and in-vehicle audio warning improved drivers' visual and behavioral performance and had the potential to enhance safety at STOP-sign-controlled grade crossings.


Subject(s)
Automobile Driving , Eye Movements , Accidents, Traffic/prevention & control , China , Computer Simulation , Female , Humans , Male , Weather
12.
IEEE Trans Image Process ; 31: 2824-2838, 2022.
Article in English | MEDLINE | ID: mdl-35349440

ABSTRACT

In the latest video coding standard, namely Versatile Video Coding (VVC), more directional intra modes and reference lines have been utilized to improve prediction efficiency. However, complex content still cannot be predicted well with only the adjacent reference samples. Although nonlocal prediction has been proposed to further improve the prediction efficiency in existing algorithms, explicit signalling or matching error potentially limits the coding efficiency. To address these issues, we propose a joint local and nonlocal progressive prediction scheme, targeting at improving nonlocal prediction accuracy without additional signalling. Specifically, template matching based prediction (TMP) is conducted firstly to derive an initial nonlocal predictor. Based on the first prediction and previously decoded reconstruction information, a local template, including inner textures and neighboring reconstruction, is carefully designed. With the local template involved in nonlocal matching process, a more accurate nonlocal predictor can be found progressively in the second prediction. Finally, the coefficients from the two predictions are fused and transmitted in bitstreams. In this way, more accurate nonlocal predictor can be derived implicitly with local information instead of being explicitly signalled. Experimental results on the reference software VTM-9.0 of VVC show that the method achieves 1.02% BD-Rate reduction for natural sequences and 2.31% BD-Rate reduction for screen content videos under all intra (AI) configuration.

13.
IEEE Trans Image Process ; 31: 2809-2823, 2022.
Article in English | MEDLINE | ID: mdl-35312621

ABSTRACT

Existing compression methods typically focus on the removal of signal-level redundancies, while the potential and versatility of decomposing visual data into compact conceptual components still lack further study. To this end, we propose a novel conceptual compression framework that encodes visual data into compact structure and texture representations, then decodes in a deep synthesis fashion, aiming to achieve better visual reconstruction quality, flexible content manipulation, and potential support for various vision tasks. In particular, we propose to compress images by a dual-layered model consisting of two complementary visual features: 1) structure layer represented by structural maps and 2) texture layer characterized by low-dimensional deep representations. At the encoder side, the structural maps and texture representations are individually extracted and compressed, generating the compact, interpretable, inter-operable bitstreams. During the decoding stage, a hierarchical fusion GAN (HF-GAN) is proposed to learn the synthesis paradigm where the textures are rendered into the decoded structural maps, leading to high-quality reconstruction with remarkable visual realism. Extensive experiments on diverse images have demonstrated the superiority of our framework with lower bitrates, higher reconstruction quality, and increased versatility towards visual analysis and content manipulation tasks.

14.
IEEE Trans Image Process ; 31: 30-42, 2022.
Article in English | MEDLINE | ID: mdl-34793298

ABSTRACT

Geometric partitioning has attracted increasing attention by its remarkable motion field description capability in the hybrid video coding framework. However, the existing geometric partitioning (GEO) scheme in Versatile Video Coding (VVC) causes a non-negligible burden for signaling the side information. Consequently, the coding efficiency is limited. In view of this, we propose a spatio-temporal correlation guided geometric partitioning (STGEO) scheme to efficiently describe the object information in the motion field of video coding. The proposed method can economize the bits consumed for side information signaling, including the partitioning mode and motion information. We firstly analyze the characteristics of partitioning mode decision and motion vector selection in a statistically-sound way. Based on the observed spatio-temporal correlation, we design a mode prediction and coding method to reduce the overhead for representing the above mentioned side information. The main idea is to predict the STGEO modes and motion candidates that have higher selection possibilities, which can guide the entropy coding, i.e., representing the predicted high-probability modes and motion candidates with fewer bits. In particular, the high-probability STGEO modes are predicted based on the edge information and history modes of adjacent STGEO-coded blocks. The corresponding motion information is represented by the index in a merge candidate list, which is adaptively inferred based on the off-line trained merge candidate selection probability. Simulation results show that the proposed approach achieves 0.95% and 1.98% bit-rate savings on average compared to VTM-8.0 without GEO for Random Access and Low-Delay B configurations, respectively.

15.
IEEE Trans Image Process ; 30: 7305-7316, 2021.
Article in English | MEDLINE | ID: mdl-34403346

ABSTRACT

Cross-component linear model (CCLM) prediction has been repeatedly proven to be effective in reducing the inter-channel redundancies in video compression. Essentially speaking, the linear model is identically trained by employing accessible luma and chroma reference samples at both encoder and decoder, elevating the level of operational complexity due to the least square regression or max-min based model parameter derivation. In this paper, we investigate the capability of the linear model in the context of sub-sampled based cross-component correlation mining, as a means of significantly releasing the operation burden and facilitating the hardware and software design for both encoder and decoder. In particular, the sub-sampling ratios and positions are elaborately designed by exploiting the spatial correlation and the inter-channel correlation. Extensive experiments verify that the proposed method is characterized by its simplicity in operation and robustness in terms of rate-distortion performance, leading to the adoption by Versatile Video Coding (VVC) standard and the third generation of Audio Video Coding Standard (AVS3).

16.
IEEE Trans Image Process ; 30: 6066-6080, 2021.
Article in English | MEDLINE | ID: mdl-34185643

ABSTRACT

Recent deep network-based compressive sensing (CS) methods have achieved great success. However, most of them regard different sampling matrices as different independent tasks and need to train a specific model for each target sampling matrix. Such practices give rise to inefficiency in computing and suffer from poor generalization ability. In this paper, we propose a novel COntrollable Arbitrary-Sampling neTwork, dubbed COAST, to solve CS problems of arbitrary-sampling matrices (including unseen sampling matrices) with one single model. Under the optimization-inspired deep unfolding framework, our COAST exhibits good interpretability. In COAST, a random projection augmentation (RPA) strategy is proposed to promote the training diversity in the sampling space to enable arbitrary sampling, and a controllable proximal mapping module (CPMM) and a plug-and-play deblocking (PnP-D) strategy are further developed to dynamically modulate the network features and effectively eliminate the blocking artifacts, respectively. Extensive experiments on widely used benchmark datasets demonstrate that our proposed COAST is not only able to handle arbitrary sampling matrices with one single model but also to achieve state-of-the-art performance with fast speed.

17.
Front Psychiatry ; 12: 648885, 2021.
Article in English | MEDLINE | ID: mdl-33986701

ABSTRACT

Background: Determining the mental health status of parents who chronically care for a child with speech impairment is important for developing appropriate interventions to improve both parents' and children's health and achieve a win-win situation. Unfortunately, no study in China has explored this issue. This study investigated the differences in four aspects of mental health between maternal and paternal caregivers for the Mandarin-speaking children with speech impairment and determine whether depressive symptoms mediate the relationships between anxiety symptoms and suicidal ideation, hopelessness and suicidal ideation. Methods: This cross-sectional questionnaire survey was conducted in February 2020 by sending a link to the predesigned electronic questionnaire in WeChat. Standardized assessment tools were employed. Hierarchical multiple logistic regression was conducted to examine the associations between various factors and suicidal ideation, and two separate structural equation models were performed to evaluate the mediating effects of depressive symptoms in the relationship between anxiety symptoms and suicidal ideation as well as between hopelessness and suicidal ideation. Results: This study included 446 parental caregivers of Mandarin-speaking children with speech impairment. Paternal caregivers had greater score than maternal caregivers on loss of motivation (one of the subdomains of hopelessness). Somatic complications of the child (OR = 2.73, 95% CI: 1.09-6.67) and depressive symptoms (OR = 3.38, 95% CI: 1.83-6.30) were positively associated with caregivers' suicidal ideation. Having speech therapy of child (OR = 0.54, 95% CI: 0.29-0.98) was negatively correlated with caregivers' suicidal ideation. There was direct effect of depressive symptoms on suicidal ideation. Depressive symptoms play mediating roles on the relationships between anxiety symptoms (ß = 0.171, p < 0.001) as well as between hopelessness and suicidal ideation (ß = 0.187, p < 0.001). Conclusions: Paternal and maternal caregivers of Mandarin-speaking children with speech impairment suffered from mental health problems. Preventive strategies and interventions to ameliorate parental psychological well-being, and health care policies to increase the accessibility to speech therapy care of children with speech impairment are imperative.

18.
Org Lett ; 23(7): 2664-2669, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33733786

ABSTRACT

A new general synthesis of pharmaceutically important azolo[1,5-a]pyrimidines starting from widely available 3(5)-aminoazoles, aldehydes, and triethylamine is developed. The key is to enable the vinylation reaction that allows the in situ generation of elusive acyclic enamines and the subsequent annulation reaction to occur. This direct and practical strategy is capable of constructing a range of 5,6-unsubstituted pyrazolo[1,5-a]pyrimidines and [1,2,4]triazolo[1,5-a]pyrimidines. More importantly, this protocol provides a concise synthetic route to prepare the clinically used zaleplon.

19.
IEEE Trans Image Process ; 30: 2422-2435, 2021.
Article in English | MEDLINE | ID: mdl-33493117

ABSTRACT

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12%-14% gain on top-10 retrieval recall, 5% higher joint localization accuracy, and near 40% gain on face identity preservation. Our codes, models and evaluation tools will be released at https://github.com/Lotayou/RATE.


Subject(s)
Image Processing, Computer-Assisted/methods , Machine Learning , Posture/physiology , Algorithms , Female , Humans , Male
20.
IEEE Trans Image Process ; 30: 2378-2393, 2021.
Article in English | MEDLINE | ID: mdl-33471757

ABSTRACT

The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft-decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, in this paper, we propose a low complexity trellis-coded quantization scheme in a scientifically sound way with theoretical modeling of the rate and distortion. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. Extensive experimental results on the VVC test model show that the proposed scheme is effective in reducing the encoding complexity by 11% and 5% with all intra and random access configurations, respectively, at the cost of only 0.11% and 0.05% BD-Rate increase. Meanwhile, on average 24% and 27% quantization time savings can be achieved under all intra and random access configurations. Due to the excellent performance, the VVC test model has adopted one implementation of the proposed scheme.

SELECTION OF CITATIONS
SEARCH DETAIL
...