Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Image Process ; 32: 5270-5282, 2023.
Article in English | MEDLINE | ID: mdl-37721872

ABSTRACT

In blurry images, the degree of image blurs may vary drastically due to different factors, such as varying speeds of shaking cameras and moving objects, as well as defects of the camera lens. However, current end-to-end models failed to explicitly take into account such diversity of blurs. This unawareness compromises the specialization at each blur level, yielding sub-optimal deblurred images as well as redundant post-processing. Therefore, how to specialize one model simultaneously at different blur levels, while still ensuring coverage and generalization, becomes an emerging challenge. In this work, we propose Ada-Deblur, a super-network that can be applied to a "broad spectrum" of blur levels with no re-training on novel blurs. To balance between individual blur level specialization and wide-range blur levels coverage, the key idea is to dynamically adapt the network architectures from a single well-trained super-network structure, targeting flexible image processing with different deblurring capacities at test time. Extensive experiments demonstrate that our work outperforms strong baselines by demonstrating better reconstruction accuracy while incurring minimal computational overhead. Besides, we show that our method is effective for both synthetic and realistic blurs compared to these baselines. The performance gap between our model and the state-of-the-art becomes more prominent when testing with unseen and strong blur levels. Specifically, our model demonstrates surprising deblurring performance on these images with PSNR improvements of around 1 dB. Our code is publicly available at https://github.com/wuqiuche/Ada-Deblur.

2.
IEEE Trans Image Process ; 32: 4237-4246, 2023.
Article in English | MEDLINE | ID: mdl-37440395

ABSTRACT

Salient object detection (SOD) aims to identify the most visually distinctive object(s) from each given image. Most recent progresses focus on either adding elaborative connections among different convolution blocks or introducing boundary-aware supervision to help achieve better segmentation, which is actually moving away from the essence of SOD, i.e., distinctiveness/salience. This paper goes back to the roots of SOD and investigates the principles of how to identify distinctive object(s) in a more effective and efficient way. Intuitively, the salience of one object should largely depend on its global context within the input image. Based on this, we devise a clean yet effective architecture for SOD, named Collaborative Content-Dependent Networks (CCD-Net). In detail, we propose a collaborative content-dependent head whose parameters are conditioned on the input image's global context information. Within the content-dependent head, a hand-crafted multi-scale (HMS) module and a self-induced (SI) module are carefully designed to collaboratively generate content-aware convolution kernels for prediction. Benefited from the content-dependent head, CCD-Net is capable of leveraging global context to detect distinctive object(s) while keeping a simple encoder-decoder design. Extensive experimental results demonstrate that our CCD-Net achieves state-of-the-art results on various benchmarks. Our architecture is simple and intuitive compared to previous solutions, resulting in competitive characteristics with respect to model complexity, operating efficiency, and segmentation accuracy.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6896-6908, 2023 Jun.
Article in English | MEDLINE | ID: mdl-32750802

ABSTRACT

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a criss-cross network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11× less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85 percent of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9, 45.76 and 55.47 percent on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at https://github.com/speedinghzl/CCNethttps://github.com/speedinghzl/CCNet.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 550-557, 2022 Jan.
Article in English | MEDLINE | ID: mdl-33646946

ABSTRACT

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by step-by-step downsampling operations and indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information adaptively, making the contextual embeddings be better aligned. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6 and 45.95 percent, respectively. Our source code is available at https://github.com/speedinghzl/AlignSeg.

SELECTION OF CITATIONS
SEARCH DETAIL
...