Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6372-6385, 2023 May.
Article in English | MEDLINE | ID: mdl-36112555

ABSTRACT

Stereo confidence estimation aims to estimate the reliability of the estimated disparity by stereo matching. Different from the previous methods that exploit the limited input modality, we present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. Moreover, we propose a knowledge distillation framework to learn more compact confidence estimation networks as student networks. By transferring the knowledge from LAF-Net as teacher networks, the student networks that solely take as input a disparity can achieve comparable performance. To transfer more informative knowledge, we also propose a module to learn the locally-varying temperature in a softmax function. We further extend this framework to a multiview scenario. Experimental results show that LAF-Net and its variations outperform the state-of-the-art stereo confidence methods on various benchmarks.

2.
IEEE Trans Image Process ; 31: 4090-4103, 2022.
Article in English | MEDLINE | ID: mdl-35687627

ABSTRACT

This paper addresses the problem of single image de-raining, that is, the task of recovering clean and rain-free background scenes from a single image obscured by a rainy artifact. Although recent advances adopt real-world time-lapse data to overcome the need for paired rain-clean images, they are limited to fully exploit the time-lapse data. The main cause is that, in terms of network architectures, they could not capture long-term rain streak information in the time-lapse data during training owing to the lack of memory components. To address this problem, we propose a novel network architecture combining the time-lapse data and, the memory network that explicitly helps to capture long-term rain streak information. Our network comprises the encoder-decoder networks and a memory network. The features extracted from the encoder are read and updated in the memory network that contains several memory items to store rain streak-aware feature representations. With the read/update operation, the memory network retrieves relevant memory items in terms of the queries, enabling the memory items to represent the various rain streaks included in the time-lapse data. To boost the discriminative power of memory features, we also present a novel background selective whitening (BSW) loss for capturing only rain streak information in the memory network by erasing the background information. Experimental results on standard benchmarks demonstrate the effectiveness and superiority of our approach.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5293-5313, 2022 Sep.
Article in English | MEDLINE | ID: mdl-33798066

ABSTRACT

Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e., the confidence, of estimated disparity maps. This information proves to be a powerful cue to naively find wrong matches as well as to improve the overall effectiveness of a variety of stereo algorithms according to different strategies. In this paper, we review more than ten years of developments in the field of confidence estimation for stereo matching. We extensively discuss and evaluate existing confidence measures and their variants, from hand-crafted ones to the most recent, state-of-the-art learning based methods. We study the different behaviors of each measure when applied to a pool of different stereo algorithms and, for the first time in literature, when paired with a state-of-the-art deep stereo network. Our experiments, carried out on five different standard datasets, provide a comprehensive overview of the field, highlighting in particular both strengths and limitations of learning-based strategies.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9102-9118, 2022 Dec.
Article in English | MEDLINE | ID: mdl-34714738

ABSTRACT

This paper presents a deep architecture, called pyramidal semantic correspondence networks (PSCNet), that estimates locally-varying affine transformation fields across semantically similar images. To deal with large appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where the affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed. Different from the previous methods which directly estimate global or local deformations, our method first starts to estimate the transformation from an entire image and then progressively increases the degree of freedom of the transformation by dividing coarse cell into finer ones. To this end, we propose two spatial pyramid models by dividing an image in a form of quad-tree rectangles or into multiple semantic elements of an object. Additionally, to overcome the limitation of insufficient training data, a novel weakly-supervised training scheme is introduced that generates progressively evolving supervisions through the spatial pyramid models by leveraging a correspondence consistency across image pairs. Extensive experimental results on various benchmarks including TSS, Proposal Flow-WILLOW, Proposal Flow-PASCAL, Caltech-101, and SPair-71k demonstrate that the proposed method outperforms the lastest methods for dense semantic correspondence.

5.
IEEE Trans Image Process ; 30: 8170-8183, 2021.
Article in English | MEDLINE | ID: mdl-34550887

ABSTRACT

Motion blur, which disturbs human and machine perceptions of a scene, has been considered an unnecessary artifact that should be removed. However, the blur can be a useful clue to understanding the dynamic scene, since various sources of motion generate different types of artifacts. Motivated by the relationship between motion and blur, we propose a motion-aware feature learning framework for dynamic scene deblurring through multi-task learning. Our multi-task framework simultaneously estimates a deblurred image and a motion field from a blurred image. We design the encoder-decoder architectures for two tasks, and the encoder part is shared between them. Our motion estimation network could effectively distinguish between different types of blur, which facilitates image deblurring. Understanding implicit motion information through image deblurring could improve the performance of motion estimation. In addition to sharing the network between two tasks, we propose a reblurring loss function to optimize the overall parameters in our multi-task architecture. We provide an intensive analysis of complementary tasks to show the effectiveness of our multi-task framework. Furthermore, the experimental results demonstrate that the proposed method outperforms the state-of-the-art deblurring methods with respect to both qualitative and quantitative evaluations.

6.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2345-2359, 2021 Jul.
Article in English | MEDLINE | ID: mdl-31940519

ABSTRACT

We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-the-art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.

7.
Sensors (Basel) ; 20(18)2020 Sep 07.
Article in English | MEDLINE | ID: mdl-32906841

ABSTRACT

This paper proposes a novel approach to high-dynamic-range (HDR) imaging of dynamic scenes to eliminate ghosting artifacts in HDR images when in the presence of severe misalignment (large object or camera motion) in input low-dynamic-range (LDR) images. Recent non-flow-based methods suffer from ghosting artifacts in the presence of large object motion. Flow-based methods face the same issue since their optical flow algorithms yield huge alignment errors. To eliminate ghosting artifacts, we propose a simple yet effective alignment network for solving the misalignment. The proposed pyramid inter-attention module (PIAM) performs alignment of LDR features by leveraging inter-attention maps. Additionally, to boost the representation of aligned features in the merging process, we propose a dual excitation block (DEB) that recalibrates each feature both spatially and channel-wise. Exhaustive experimental results demonstrate the effectiveness of the proposed PIAM and DEB, achieving state-of-the-art performance in terms of producing ghost-free HDR images.

8.
Article in English | MEDLINE | ID: mdl-31976896

ABSTRACT

Convolutional neural networks (CNNs) have facilitated substantial progress on various problems in computer vision and image processing. However, applying them to image fusion has remained challenging due to the lack of the labelled data for supervised learning. This paper introduces a deep image fusion network (DIF-Net), an unsupervised deep learning framework for image fusion. The DIF-Net parameterizes the entire processes of image fusion, comprising of feature extraction, feature fusion, and image reconstruction, using a CNN. The purpose of DIF-Net is to generate an output image which has an identical contrast to high-dimensional input images. To realize this, we propose an unsupervised loss function using the structure tensor representation of the multi-channel image contrasts. Different from traditional fusion methods that involve time-consuming optimization or iterative procedures to obtain the results, our loss function is minimized by a stochastic deep learning solver with large-scale examples. Consequently, the proposed method can produce fused images that preserve source image details through a single forward network trained without reference ground-truth labels. The proposed method has broad applicability to various image fusion problems, including multi-spectral, multi-focus, and multi-exposure image fusions. Quantitative and qualitative evaluations show that the proposed technique outperforms existing state-of-the-art approaches for various applications.

9.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 59-73, 2020 Jan.
Article in English | MEDLINE | ID: mdl-30371354

ABSTRACT

Techniques for dense semantic correspondence have provided limited ability to deal with the geometric variations that commonly exist between semantically similar images. While variations due to scale and rotation have been examined, there is a lack of practical solutions for more complex deformations such as affine transformations because of the tremendous size of the associated solution space. To address this problem, we present a discrete-continuous transformation matching (DCTM) framework where dense affine transformation fields are inferred through a discrete label optimization in which the labels are iteratively updated via continuous regularization. In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor. Furthermore, leveraging correspondence consistency and confidence-guided filtering in each iteration facilitates the convergence of our method. Experimental results show that this model outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks and applications.

10.
IEEE Trans Image Process ; 28(11): 5394-5406, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31144636

ABSTRACT

We present a deep architecture and learning framework for establishing correspondences across cross-spectral visible and infrared images in an unpaired setting. To overcome the unpaired cross-spectral data problem, we design the unified image translation and feature extraction modules to be learned in a joint and boosting manner. Concretely, the image translation module is learned only with the unpaired cross-spectral data, and the feature extraction module is learned with an input image and its translated image. By learning two modules simultaneously, the image translation module generates the translated image that preserves not only the domain-specific attributes with separate latent spaces but also the domain-agnostic contents with feature consistency constraint. In an inference phase, the cross-spectral feature similarity is augmented by intra-spectral similarities between the features extracted from the translated images. Experimental results show that this model outperforms the state-of-the-art unpaired image translation methods and cross-spectral feature descriptors on various visible and infrared benchmarks.

11.
IEEE Trans Pattern Anal Mach Intell ; 41(3): 581-595, 2019 Mar.
Article in English | MEDLINE | ID: mdl-29993476

ABSTRACT

We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. Unlike traditional dense correspondence approaches for estimating depth or optical flow, semantic correspondence estimation poses additional challenges due to intra-class appearance and shape variations among different instances within the same object or scene category. To robustly match points across semantically similar images, we formulate FCSS using local self-similarity (LSS), which is inherently insensitive to intra-class appearance variations. LSS is incorporated through a proposed convolutional self-similarity (CSS) layer, where the sampling patterns and the self-similarity measure are jointly learned in an end-to-end and multi-scale manner. Furthermore, to address shape variations among different object instances, we propose a convolutional affine transformer (CAT) layer that estimates explicit affine transformation fields at each pixel to transform the sampling patterns and corresponding receptive fields. As training data for semantic correspondence is rather limited, we propose to leverage object candidate priors provided in most existing datasets and also correspondence consistency between object pairs to enable weakly-supervised learning. Experiments demonstrate that FCSS significantly outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks.

12.
IEEE Trans Image Process ; 28(3): 1299-1313, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30371368

ABSTRACT

We present a deep architecture that estimates a stereo confidence, which is essential for improving the accuracy of stereo matching algorithms. In contrast to existing methods based on deep convolutional neural networks (CNNs) that rely on only one of the matching cost volume or estimated disparity map, our network estimates the stereo confidence by using the two heterogeneous inputs simultaneously. Specifically, the matching probability volume is first computed from the matching cost volume with residual networks and a pooling module in a manner that yields greater robustness. The confidence is then estimated through a unified deep network that combines confidence features extracted both from the matching probability volume and its corresponding disparity. In addition, our method extracts the confidence features of the disparity map by applying multiple convolutional filters with varying sizes to an input disparity map. To learn our networks in a semi-supervised manner, we propose a novel loss function that use confident points to compute the image reconstruction loss. To validate the effectiveness of our method in a disparity post-processing step, we employ three post-processing approaches; cost modulation, ground control points-based propagation, and aggregated ground control points-based propagation. Experimental results demonstrate that our method outperforms state-of-the-art confidence estimation methods on various benchmarks.

13.
Article in English | MEDLINE | ID: mdl-30582541

ABSTRACT

Most variational formulations for structure-texture image decomposition force structure images to have small norm in some functional spaces, and share a common notion of edges, i.e., large-gradients or -intensity differences. However, such definition makes it difficult to distinguish structure edges from oscillations that have fine spatial scale but high contrast. In this paper, we introduce a new model by learning deep variational prior for structure images without explicit training data. An alternating direction method of multiplier (ADMM) algorithm and its modular structure are adopted to plug deep variational priors into an iterative smoothing process. The central observations are that convolution neural networks (CNNs) can replace the total variation prior, and are indeed powerful to capture the natures of structure and texture. We show that our learned priors using CNNs successfully differentiate highamplitude details from structure edges, and avoid halo artifacts. Different from previous data-driven smoothing schemes, our formulation provides another degree of freedom to produce continuous smoothing effects. Experimental results demonstrate the effectiveness of our approach on various computational photography and image processing applications, including texture removal, detail manipulation, HDR tone-mapping, and nonphotorealistic abstraction.

14.
Article in English | MEDLINE | ID: mdl-29994769

ABSTRACT

Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman (SB) iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new dataset of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks including both indoor and outdoor scenarios.

15.
IEEE Trans Pattern Anal Mach Intell ; 39(9): 1712-1729, 2017 09.
Article in English | MEDLINE | ID: mdl-27723577

ABSTRACT

Establishing dense correspondences between multiple images is a fundamental task in many applications. However, finding a reliable correspondence between multi-modal or multi-spectral images still remains unsolved due to their challenging photometric and geometric variations. In this paper, we propose a novel dense descriptor, called dense adaptive self-correlation (DASC), to estimate dense multi-modal and multi-spectral correspondences. Based on an observation that self-similarity existing within images is robust to imaging modality variations, we define the descriptor with a series of an adaptive self-correlation similarity measure between patches sampled by a randomized receptive field pooling, in which a sampling pattern is obtained using a discriminative learning. The computational redundancy of dense descriptors is dramatically reduced by applying fast edge-aware filtering. Furthermore, in order to address geometric variations including scale and rotation, we propose a geometry-invariant DASC (GI-DASC) descriptor that effectively leverages the DASC through a superpixel-based representation. For a quantitative evaluation of the GI-DASC, we build a novel multi-modal benchmark as varying photometric and geometric conditions. Experimental results demonstrate the outstanding performance of the DASC and GI-DASC in many cases of dense multi-modal and multi-spectral correspondences.

16.
IEEE Trans Image Process ; 25(11): 5227-38, 2016 11.
Article in English | MEDLINE | ID: mdl-27552747

ABSTRACT

This paper describes a method for high-quality depth superresolution. The standard formulations of image-guided depth upsampling, using simple joint filtering or quadratic optimization, lead to texture copying and depth bleeding artifacts. These artifacts are caused by inherent discrepancy of structures in data from different sensors. Although there exists some correlation between depth and intensity discontinuities, they are different in distribution and formation. To tackle this problem, we formulate an optimization model using a nonconvex regularizer. A nonlocal affinity established in a high-dimensional feature space is used to offer precisely localized depth boundaries. We show that the proposed method iteratively handles differences in structure between depth and intensity images. This property enables reducing texture copying and depth bleeding artifacts significantly on a variety of range data sets. We also propose a fast alternating direction method of multipliers algorithm to solve our optimization problem. Our solver shows a noticeable speed up compared with the conventional majorize-minimize algorithm. Extensive experiments with synthetic and real-world data sets demonstrate that the proposed method is superior to the existing methods.

17.
Opt Express ; 24(13): 14183-95, 2016 Jun 27.
Article in English | MEDLINE | ID: mdl-27410576

ABSTRACT

A mathematical formula of calculating the fringe periods of the color moirés appearing at the contact-type 3-D displays is derived. It is typical that the color moirés are chirped and the period of the line pattern in viewing zone forming optics is more than two times of that of the pixel pattern in the display panel. These make impossible to calculate the fringe periods of the color moirés with the conventional beat frequency formula. The derived formula work very well for any combination of two line patterns having either a same line period or different line periods. This is experimentally proved. Furthermore, it is also shown that the fringe period can be expressed in terms of the viewing distance and focal length of the viewing zone forming optics.

18.
IEEE Trans Image Process ; 24(12): 5953-66, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26529766

ABSTRACT

Inferring scene depth from a single monocular image is a highly ill-posed problem in computer vision. This paper presents a new gradient-domain approach, called depth analogy, that makes use of analogy as a means for synthesizing a target depth field, when a collection of RGB-D image pairs is given as training data. Specifically, the proposed method employs a non-parametric learning process that creates an analogous depth field by sampling reliable depth gradients using visual correspondence established on training image pairs. Unlike existing data-driven approaches that directly select depth values from training data, our framework transfers depth gradients as reconstruction cues, which are then integrated by the Poisson reconstruction. The performance of most conventional approaches relies heavily on the training RGB-D data used in the process, and such a dependency severely degenerates the quality of reconstructed depth maps when the desired depth distribution of an input image is quite different from that of the training data, e.g., outdoor versus indoor scenes. Our key observation is that using depth gradients in the reconstruction is less sensitive to scene characteristics, providing better cues for depth recovery. Thus, our gradient-domain approach can support a great variety of training range datasets that involve substantial appearance and geometric variations. The experimental results demonstrate that our (depth) gradient-domain approach outperforms existing data-driven approaches directly working on depth domain, even when only uncorrelated training datasets are available.

19.
IEEE Trans Image Process ; 24(11): 3652-65, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26111394

ABSTRACT

This paper presents a texture flow estimation method that uses an appearance-space clustering and a correspondence search in the space of deformed exemplars. To estimate the underlying texture flow, such as scale, orientation, and texture label, most existing approaches require a certain amount of user interactions. Strict assumptions on a geometric model further limit the flow estimation to such a near-regular texture as a gradient-like pattern. We address these problems by extracting distinct texture exemplars in an unsupervised way and using an efficient search strategy on a deformation parameter space. This enables estimating a coherent flow in a fully automatic manner, even when an input image contains multiple textures of different categories. A set of texture exemplars that describes the input texture image is first extracted via a medoid-based clustering in appearance space. The texture exemplars are then matched with the input image to infer deformation parameters. In particular, we define a distance function for measuring a similarity between the texture exemplar and a deformed target patch centered at each pixel from the input image, and then propose to use a randomized search strategy to estimate these parameters efficiently. The deformation flow field is further refined by adaptively smoothing the flow field under guidance of a matching confidence score. We show that a local visual similarity, directly measured from appearance space, explains local behaviors of the flow very well, and the flow field can be estimated very efficiently when the matching criterion meets the randomized search strategy. Experimental results on synthetic and natural images show that the proposed method outperforms existing methods.

20.
IEEE Trans Image Process ; 24(5): 1524-35, 2015 May.
Article in English | MEDLINE | ID: mdl-25769140

ABSTRACT

This paper presents a depth superresolution (SR) method that uses both of a low-resolution (LR) depth image and a high-resolution (HR) intensity image. We formulate depth SR as a graph-based transduction problem. In particular, the HR intensity image is represented as an undirected graph, in which pixels are characterized as vertices, and their relations are encoded as an affinity function. When the vertices initially labeled with certain depth hypotheses (from the LR depth image) are regarded as input queries, all the vertices are scored with respect to the relevances to these queries by a classifying function. Each vertex is then labeled with the depth hypothesis that receives the highest relevance score. We design the classifying function by considering the local and global structures of the HR intensity image. This approach enables us to address a depth bleeding problem that typically appears in current depth SR methods. Furthermore, input queries are assigned in a probabilistic manner, making depth SR robust to noisy depth measurements. We also analyze existing depth SR methods in the context of transduction, and discuss their theoretic relations. Intensive experiments demonstrate the superiority of the proposed method over state-of-the-art methods both qualitatively and quantitatively.

SELECTION OF CITATIONS
SEARCH DETAIL
...