Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Image Process ; 33: 1534-1548, 2024.
Article in English | MEDLINE | ID: mdl-38363667

ABSTRACT

Structure-from-Motion (SfM) aims to recover 3D scene structures and camera poses based on the correspondences between input images, and thus the ambiguity caused by duplicate structures (i.e., different structures with strong visual resemblance) always results in incorrect camera poses and 3D structures. To deal with the ambiguity, most existing studies resort to additional constraint information or implicit inference by analyzing two-view geometries or feature points. In this paper, we propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction. Specifically, a novel structure is proposed, namely, track-community, in which each community consists of a group of tracks and represents a local segment in the scene. A community detection algorithm is performed on the track-graph to partition the scene into segments. Then, the potential ambiguous segments are detected by analyzing the neighborhood of tracks and corrected by checking the pose consistency. Finally, we perform partial reconstruction on each segment and align them with a novel bidirectional consistency cost function which considers both 3D-3D correspondences and pairwise relative camera poses. Experimental results demonstrate that our approach can robustly alleviate reconstruction failure resulting from visually indistinguishable structures and accurately merge the partial reconstructions.

2.
IEEE Trans Vis Comput Graph ; 29(3): 1769-1784, 2023 03.
Article in English | MEDLINE | ID: mdl-34847031

ABSTRACT

We present a multi-sensor system for consistent 3D hand pose tracking and modeling that leverages the advantages of both wearable and optical sensors. Specifically, we employ a stretch-sensing soft glove and three IMUs in combination with an RGB-D camera. Different sensor modalities are fused based on the availability and confidence estimation, enabling seamless hand tracking in challenging environments with partial or even complete occlusion. To maximize the accuracy while maintaining high ease-of-use, we propose an automated user calibration that uses the RGB-D camera data to refine both the glove mapping model and the multi-IMU system parameters. Extensive experiments show that our setup outperforms the wearable-only approaches when the hand is in the field-of-view and outplays the camera-only methods when the hand is occluded.


Subject(s)
Computer Graphics , Wearable Electronic Devices , Hand
3.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 932-945, 2023 Jan.
Article in English | MEDLINE | ID: mdl-35294342

ABSTRACT

3D hand pose estimation is a challenging problem in computer vision due to the high degrees-of-freedom of hand articulated motion space and large viewpoint variation. As a consequence, similar poses observed from multiple views can be dramatically different. In order to deal with this issue, view-independent features are required to achieve state-of-the-art performance. In this paper, we investigate the impact of view-independent features on 3D hand pose estimation from a single depth image, and propose a novel recurrent neural network for 3D hand pose estimation, in which a cascaded 3D pose-guided alignment strategy is designed for view-independent feature extraction and a recurrent hand pose module is designed for modeling the dependencies among sequential aligned features for 3D hand pose estimation. In particular, our cascaded pose-guided 3D alignments are performed in 3D space in a coarse-to-fine fashion. First, hand joints are predicted and globally transformed into a canonical reference frame; Second, the palm of the hand is detected and aligned; Third, local transformations are applied to the fingers to refine the final predictions. The proposed recurrent hand pose module for aligned 3D representation can extract recurrent pose-aware features and iteratively refines the estimated hand pose. Our recurrent module could be utilized for both single-view estimation and sequence-based estimation with 3D hand pose tracking. Experiments show that our method improves the state-of-the-art by a large margin on popular benchmarks with the simple yet efficient alignment and network architectures.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2151-2165, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35344487

ABSTRACT

Undesirable reflections contained in photos taken in front of glass windows or doors often degrade visual quality of the image. Separating two layers apart benefits both human and machine perception. The polarization status of the light changes after refraction or reflection, providing more observations of the scene, which can benefit the reflection separation. Different from previous works that take three or more polarization images as input, we propose to exploit physical constraints from a pair of unpolarized and polarized images to separate reflection and transmission layers in this paper. Due to the simplified capturing setup, the system is more under-determined compared to the existing polarization-based works. In order to solve this problem, we propose to estimate the semi-reflector orientation first to make the physical image formation well-posed, and then learn to reliably separate two layers using additional networks based on both physical and numerical analysis. In addition, a motion estimation network is introduced to handle the misalignment of paired input. Quantitative and qualitative experimental results show our approach performs favorably over existing polarization and single image based solutions.

5.
IEEE Trans Vis Comput Graph ; 28(11): 3727-3736, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36048987

ABSTRACT

Bundle adjustment (BA) is widely used in SLAM and SfM, which are key technologies in Augmented Reality. For real-time SLAM and large-scale SfM, the efficiency of BA is of great importance. This paper proposes CoLi-BA, a novel and efficient BA solver that significantly improves the optimization speed by compact linearization and reordering. Specifically, for each reprojection function, the redundant matrix representation of Jacobian is replaced with a tiny 3D vector, by which the computational complexity, memory storage, and cache missing for Hessian matrix construction and Schur complement are significantly reduced. Besides, we also propose a novel reordering strategy to improve the cache efficiency for Schur complement. Experiments on diverse datasets show that the speed of the proposed CoLi-BA is five times that of Ceres and two times that of g2o without sacrificing accuracy. We further verify the effectiveness by porting CoLi-BA to the open-source SLAM and SfM systems. Even when running the proposed solver in a single thread, the local BA of SLAM only takes about 20ms on a desktop PC, and the reconstruction of SfM with seven thousand photos only takes half an hour. The source code is available on the webpage: https://github.com/zju3dv/CoLi-BA.

6.
IEEE Trans Image Process ; 30: 4275-4290, 2021.
Article in English | MEDLINE | ID: mdl-33826515

ABSTRACT

Hand pose understanding is essential to applications such as human computer interaction and augmented reality. Recently, deep learning based methods achieve great progress in this problem. However, the lack of high-quality and large-scale dataset prevents the further improvement of hand pose related tasks such as 2D/3D hand pose from color and depth from color. In this paper, we develop a large-scale and high-quality synthetic dataset, PBRHand. The dataset contains millions of photo-realistic rendered hand images and various ground truths including pose, semantic segmentation, and depth. Based on the dataset, we firstly investigate the effect of rendering methods and used databases on the performance of three hand pose related tasks: 2D/3D hand pose from color, depth from color and 3D hand pose from depth. This study provides insights that photo-realistic rendering dataset is worthy of synthesizing and shows that our new dataset can improve the performance of the state-of-the-art on these tasks. This synthetic data also enables us to explore multi-task learning, while it is expensive to have all the ground truth available on real data. Evaluations show that our approach can achieve state-of-the-art or competitive performance on several public datasets.


Subject(s)
Databases, Factual , Hand , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Color , Gestures , Hand/anatomy & histology , Hand/diagnostic imaging , Humans , Imaging, Three-Dimensional/methods
7.
IEEE Trans Image Process ; 30: 532-545, 2021.
Article in English | MEDLINE | ID: mdl-33201814

ABSTRACT

Recent emerging technologies such AR/VR and HCI are drawing high demand on more comprehensive hand shape understanding, requiring not only 3D hand skeleton pose but also hand shape geometry. In this paper, we propose a deep learning framework to produce 3D hand shape from a single depth image. To address the challenge that capturing ground truth 3D hand shape in the training dataset is non-trivial, we leverage synthetic data to construct a statistical hand shape model and adopt weak supervision from widely accessible hand skeleton pose annotation. To bridge the gap due to the different hand skeleton definitions in the existing public datasets, we propose a joint regression network for hand pose adaptation. To reconstruct the hand shape, we use Chamfer loss between the predicted hand shape and the point cloud from the input depth to learn the shape reconstruction model in a weakly-supervised manner. Experiments demonstrate that our model adapts well to the real data and produces accurate hand shapes that outperform the state-of-the-art methods both qualitatively and quantitatively.

SELECTION OF CITATIONS
SEARCH DETAIL
...