Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10929-10946, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37018107

RESUMO

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the inconsistent consensus. To further improve the accuracy, we design a series of simple yet effective components as follows: i) a recurrent auxiliary classification module (RACM) promoting model learning at the semantic level; ii) a confidence enhancement module (CEM) assisting the model in improving the quality of the final predictions; and iii) a group-based symmetric triplet (GST) loss guiding the model to learn more discriminative features. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been released at https://github.com/ZhengPeng7/GCoNet_plus.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10197-10211, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37027560

RESUMO

Segmenting highly-overlapping image objects is challenging, because there is typically no distinction between real object contours and occlusion boundaries on images. Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees). The explicit modeling of occlusion relationship with bilayer structure naturally decouples the boundaries of both the occluding and occluded instances, and considers the interaction between them during mask regression. We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling using the vision transformer (ViT), by representing instances in the image as separate learnable occluder and occludee queries. Large and consistent improvements using one/two-stage and query-based object detectors with various backbones and network layer choices validate the generalization ability of bilayer decoupling, as shown by extensive experiments on image instance segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation benchmarks (YTVIS, OVIS, BDD100 K MOTS), especially for heavy occlusion cases.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
3.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5002-5015, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-33989152

RESUMO

We propose HyFRIS-Net to jointly estimate the hybrid reflectance and illumination models, as well as the refined face shape from a single unconstrained face image in a pre-defined texture space. The proposed hybrid reflectance and illumination representation ensure photometric face appearance modeling in both parametric and non-parametric spaces for efficient learning. While forcing the reflectance consistency constraint for the same person and face identity constraint for different persons, our approach recovers an occlusion-free face albedo with disambiguated color from the illumination color. Our network is trained in a self-evolving manner to achieve general applicability on real-world data. We conduct comprehensive qualitative and quantitative evaluations with state-of-the-art methods to demonstrate the advantages of HyFRIS-Net in modeling photo-realistic face albedo, illumination, and shape.


Assuntos
Iluminação , Reconhecimento Automatizado de Padrão , Algoritmos , Face/diagnóstico por imagem , Humanos , Reconhecimento Automatizado de Padrão/métodos
4.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9489-9502, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34822324

RESUMO

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point. In addition, we extend PRIN to a sparse version called SPRIN, which directly operates on sparse point clouds. Both PRIN and SPRIN can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. Results show that, on the dataset with randomly rotated point clouds, SPRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide thorough theoretical proof and analysis for point-wise rotation invariance achieved by our methods. The code to reproduce our results will be made publicly available.

5.
IEEE Trans Image Process ; 30: 7856-7866, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34524959

RESUMO

Human pose transfer has been becoming one of the emerging research topics in recent years. However, state-of-the-art results are still far from satisfactory. One main reason is that these end-to-end methods are often blindly trained without the semantic understanding of its content. In this paper, we propose a novel method for human pose transfer with consideration of the semantic part-based representation of a human. In particular, we propose to segment the human body into multiple parts, and each of them represents a semantic region of a human. With the proposed part-based layer generators, a high-quality result is guaranteed for each local semantic region. We design a three-stage hierarchical framework to fuse local representations into the final result in a coarse-to-fine manner, which provides adaptive attention for global consistency and local details, respectively. Via exploiting spatial guidance from 3D human model through the framework, our method can naturally handle the ambiguity of self-occlusions which always causes artifacts in previous methods. With semantic-aware and spatial-aware representations, our method outperforms previous approaches quantitatively and qualitatively in better handling self-occlusions, fine detail preservation/synthesis and a higher resolution result.


Assuntos
Algoritmos , Semântica , Humanos
6.
IEEE Trans Image Process ; 30: 2888-2897, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33539298

RESUMO

In this paper, we propose a new method to super-resolve low resolution human body images by learning efficient multi-scale features and exploiting useful human body prior. Specifically, we propose a lightweight multi-scale block (LMSB) as basic module of a coherent framework, which contains an image reconstruction branch and a prior estimation branch. In the image reconstruction branch, the LMSB aggregates features of multiple receptive fields so as to gather rich context information for low-to-high resolution mapping. In the prior estimation branch, we adopt the human parsing maps and nonsubsampled shearlet transform (NSST) sub-bands to represent the human body prior, which is expected to enhance the details of reconstructed human body images. When evaluated on the newly collected HumanSR dataset, our method outperforms state-of-the-art image super-resolution methods with  âˆ¼ 8× fewer parameters; moreover, our method significantly improves the performance of human image analysis tasks (e.g. human parsing and pose estimation) for low-resolution inputs.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Humanos , Postura/fisiologia
7.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2449-2462, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31995475

RESUMO

We present an algorithm to directly solve numerous image restoration problems (e.g., image deblurring, image dehazing, and image deraining). These problems are ill-posed, and the common assumptions for existing methods are usually based on heuristic image priors. In this paper, we show that these problems can be solved by generative models with adversarial learning. However, a straightforward formulation based on a straightforward generative adversarial network (GAN) does not perform well in these tasks, and some structures of the estimated images are usually not preserved well. Motivated by an interesting observation that the estimated results should be consistent with the observed inputs under the physics models, we propose an algorithm that guides the estimation process of a specific task within the GAN framework. The proposed model is trained in an end-to-end fashion and can be applied to a variety of image restoration and low-level vision problems. Extensive experiments demonstrate that the proposed method performs favorably against state-of-the-art algorithms.

8.
IEEE Trans Image Process ; 30: 907-920, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33259297

RESUMO

Person re-identification aims to identify whether pairs of images belong to the same person or not. This problem is challenging due to large differences in camera views, lighting and background. One of the mainstream in learning CNN features is to design loss functions which reinforce both the class separation and intra-class compactness. In this paper, we propose a novel Orthogonal Center Learning method with Subspace Masking for person re-identification. We make the following contributions: 1) we develop a center learning module to learn the class centers by simultaneously reducing the intra-class differences and inter-class correlations by orthogonalization; 2) we introduce a subspace masking mechanism to enhance the generalization of the learned class centers; and 3) we propose to integrate the average pooling and max pooling in a regularizing manner that fully exploits their powers. Extensive experiments show that our proposed method consistently outperforms the state-of-the-art methods on large-scale ReID datasets including Market-1501, DukeMTMC-ReID, CUHK03 and MSMT17.

9.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 232-245, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30281438

RESUMO

While conventional calibrated photometric stereo methods assume that light intensities and sensor exposures are known or unknown but identical across observed images, this assumption easily breaks down in practical settings due to individual light bulb's characteristics and limited control over sensors. This paper studies the effect of unknown and possibly non-uniform light intensities and sensor exposures among observed images on the shape recovery based on photometric stereo. This leads to the development of a "semi-calibrated" photometric stereo method, where the light directions are known but light intensities (and sensor exposures) are unknown. We show that the semi-calibrated photometric stereo becomes a bilinear problem, whose general form is difficult to solve, but in the photometric stereo context, there exists a unique solution for the surface normal and light intensities (or sensor exposures). We further show that there exists a linear solution method for the problem, and develop efficient and stable solution methods. The semi-calibrated photometric stereo is advantageous over conventional calibrated photometric stereo in accurate determination of surface normal, because it relaxes the assumption of known light intensity ratios/sensor exposures. The experimental results show superior accuracy of the semi-calibrated photometric stereo in comparison to conventional methods in practical settings.

10.
IEEE Trans Image Process ; 28(3): 1054-1067, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30281457

RESUMO

We propose a deep convolutional neural network (CNN) method for natural image matting. Our method takes multiple initial alpha mattes of the previous methods and normalized RGB color images as inputs, and directly learns an end-to-end mapping between the inputs and reconstructed alpha mattes. Among the various existing methods, we focus on using two simple methods as initial alpha mattes: the closed-form matting and KNN matting. They are complementary to each other in terms of local and nonlocal principles. A major benefit of our method is that it can "recognize" different local image structures and then combine the results of local (closed-form matting) and nonlocal (KNN matting) mattings effectively to achieve higher quality alpha mattes than both of the inputs. Furthermore, we verify extendability of the proposed network to different combinations of initial alpha mattes from more advanced techniques such as KL divergence matting and information-flow matting. On the top of deep CNN matting, we build an RGB guided JPEG artifacts removal network to handle JPEG block artifacts in alpha matting. Extensive experiments demonstrate that our proposed deep CNN matting produces visually and quantitatively high-quality alpha mattes. We perform deeper experiments including studies to evaluate the importance of balancing training data and to measure the effects of initial alpha mattes and also consider results from variant versions of the proposed network to analyze our proposed DCNN matting. In addition, our method achieved high ranking in the public alpha matting evaluation dataset in terms of the sum of absolute differences, mean squared errors, and gradient errors. Also, our RGB guided JPEG artifacts removal network restores the damaged alpha mattes from compressed images in JPEG format.

11.
IEEE Trans Pattern Anal Mach Intell ; 41(2): 297-310, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29994179

RESUMO

One of the core applications of light field imaging is depth estimation. To acquire a depth map, existing approaches apply a single photo-consistency measure to an entire light field. However, this is not an optimal choice because of the non-uniform light field degradations produced by limitations in the hardware design. In this paper, we introduce a pipeline that automatically determines the best configuration for photo-consistency measure, which leads to the most reliable depth label from the light field. We analyzed the practical factors affecting degradation in lenslet light field cameras, and designed a learning based framework that can retrieve the best cost measure and optimal depth label. To enhance the reliability of our method, we augmented an existing light field benchmark to simulate realistic source dependent noise, aberrations, and vignetting artifacts. The augmented dataset was used for the training and validation of the proposed approach. Our method was competitive with several state-of-the-art methods for the benchmark and real-world light field datasets.

12.
IEEE Trans Pattern Anal Mach Intell ; 40(7): 1599-1610, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28796612

RESUMO

Recent advances in saliency detection have utilized deep learning to obtain high-level features to detect salient regions in scenes. These advances have yielded results superior to those reported in past work, which involved the use of hand-crafted low-level features for saliency detection. In this paper, we propose ELD-Net, a unified deep learning framework for accurate and efficient saliency detection. We show that hand-crafted features can provide complementary information to enhance saliency detection that uses only high-level features. Our method uses both low-level and high-level features for saliency detection. High-level features are extracted using GoogLeNet, and low-level features evaluate the relative importance of a local region using its differences from other regions in an image. The two feature maps are independently encoded by the convolutional and the ReLU layers. The encoded low-level and high-level features are then combined by concatenation and convolution. Finally, a linear fully connected layer is used to evaluate the saliency of a queried region. A full resolution saliency map is obtained by querying the saliency of each local region of an image. Since the high-level features are encoded at low resolution, and the encoded high-level features can be reused for every query region, our ELD-Net is very fast. Our experiments show that our method outperforms state-of-the-art deep learning-based saliency detection methods.

13.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 376-391, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-28278459

RESUMO

Rank minimization can be converted into tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted NNM (WNNM). The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), with which we avoid direct computation of SVD. The key idea is to extract an approximate basis for the range of the matrix from its compressed matrix. Given the basis, we compute partial singular values of the original matrix from the small factored matrix. In addition, by developping a range propagation method, our method further speeds up the extraction of approximate basis at each iteration. Our theoretical analysis shows the relationship between the approximation bound of SVD and its effect to NNM via SVT. Along with the analysis, our empirical results quantitatively and qualitatively show that our approximation rarely harms the convergence of the host algorithms. We assess the efficiency and accuracy of the proposed method on various computer vision problems, e.g., subspace clustering, weather artifact removal, and simultaneous multi-image alignment and rectification.

14.
IEEE Trans Pattern Anal Mach Intell ; 39(8): 1591-1604, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28113654

RESUMO

We propose a robust uncalibrated multiview photometric stereo method for high quality 3D shape reconstruction. In our method, a coarse initial 3D mesh obtained using a multiview stereo method is projected onto a 2D planar domain using a planar mesh parameterization technique. We describe methods for surface normal estimation that work in the parameterized 2D space that jointly incorporates all geometric and photometric cues from multiple viewpoints. Using an estimated surface normal map, a refined 3D mesh is then recovered by computing an optimal displacement map in the same 2D planar domain. Our method avoids the need of merging view-dependent surface normal maps that is often required in conventional methods. We conduct evaluation on various real-world objects containing surfaces with specular reflections, multiple albedos, and complex topologies in both controlled and uncontrolled settings and demonstrate that accurate 3D meshes with fine geometric details can be recovered by our method.

15.
IEEE Trans Image Process ; 25(8): 3639-3654, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-28113552

RESUMO

This paper presents an automatic method to extract a multi-view object in a natural environment. We assume that the target object is bounded by the convex volume of interest defined by the overlapping space of camera viewing frustums. There are two key contributions of our approach. First, we present an automatic method to identify a target object across different images for multi-view binary co-segmentation. The extracted target object shares the same geometric representation in space with a distinctive color and texture model from the background. Second, we present an algorithm to detect color ambiguous regions along the object boundary for matting refinement. Our matting region detection algorithm is based on information theory, which measures the Kullback-Leibler (KL) divergence of local color distribution of different pixel-bands. The local pixel-band with the largest entropy is selected for matte refinement, subject to the multi-view consistent constraint. Our results are highquality alpha mattes consistent across all different viewpoints. We demonstrate the effectiveness of the proposed method using various examples.

16.
IEEE Trans Image Process ; 25(1): 9-23, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26529764

RESUMO

In this paper, we introduce a novel approach to automatically detect salient regions in an image. Our approach consists of global and local features, which complement each other to compute a saliency map. The first key idea of our work is to create a saliency map of an image by using a linear combination of colors in a high-dimensional color space. This is based on an observation that salient regions often have distinctive colors compared with backgrounds in human perception, however, human perception is complicated and highly nonlinear. By mapping the low-dimensional red, green, and blue color to a feature vector in a high-dimensional color space, we show that we can composite an accurate saliency map by finding the optimal linear combination of color coefficients in the high-dimensional color space. To further improve the performance of our saliency estimation, our second key idea is to utilize relative location and color contrast between superpixels as features and to resolve the saliency estimation from a trimap via a learning-based algorithm. The additional local features and learning-based algorithm complement the global estimation from the high-dimensional color transform-based algorithm. The experimental results on three benchmark datasets show that our approach is effective in comparison with the previous state-of-the-art saliency estimation methods.

17.
IEEE Trans Pattern Anal Mach Intell ; 38(4): 744-58, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26353362

RESUMO

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g., high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.

18.
IEEE Trans Pattern Anal Mach Intell ; 37(6): 1219-32, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26357344

RESUMO

This paper introduces a new high dynamic range (HDR) imaging algorithm which utilizes rank minimization. Assuming a camera responses linearly to scene radiance, the input low dynamic range (LDR) images captured with different exposure time exhibit a linear dependency and form a rank-1 matrix when stacking intensity of each corresponding pixel together. In practice, misalignments caused by camera motion, presences of moving objects, saturations and image noise break the rank-1 structure of the LDR images. To address these problems, we present a rank minimization algorithm which simultaneously aligns LDR images and detects outliers for robust HDR generation. We evaluate the performances of our algorithm systematically using synthetic examples and qualitatively compare our results with results from the state-of-the-art HDR algorithms using challenging real world examples.

19.
IEEE Trans Image Process ; 23(12): 5559-72, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25291793

RESUMO

This paper describes an application framework to perform high-quality upsampling and completion on noisy depth maps. Our framework targets a complementary system setup, which consists of a depth camera coupled with an RGB camera. Inspired by a recent work that uses a nonlocal structure regularization, we regularize depth maps in order to maintain fine details and structures. We extend this regularization by combining the additional high-resolution RGB input when upsampling a low-resolution depth map together with a weighting scheme that favors structure details. Our technique is also able to repair large holes in a depth map with consideration of structures and discontinuities utilizing edge information from the RGB input. Quantitative and qualitative results show that our method outperforms existing approaches for depth map upsampling and completion. We describe the complete process for this system, including device calibration, scene warping for input alignment, and even how our framework can be extended for video depth-map completion with the consideration of temporal coherence.

20.
IEEE Trans Pattern Anal Mach Intell ; 36(2): 209-21, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24356344

RESUMO

We propose a physically-based approach to separate reflection using multiple polarized images with a background scene captured behind glass. The input consists of three polarized images, each captured from the same view point but with a different polarizer angle separated by 45 degrees. The output is the high-quality separation of the reflection and background layers from each of the input images. A main technical challenge for this problem is that the mixing coefficient for the reflection and background layers depends on the angle of incidence and the orientation of the plane of incidence, which are spatially varying over the pixels of an image. Exploiting physical properties of polarization for a double-surfaced glass medium, we propose a multiscale scheme which automatically finds the optimal separation of the reflection and background layers. Through experiments, we demonstrate that our approach can generate superior results to those of previous methods.


Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Refratometria/métodos , Técnica de Subtração , Simulação por Computador , Modelos Teóricos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...