Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1257-1272, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37962994

RESUMO

In this article, we introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry; and LayerNet, a deep network that given a single image of a person simultaneously performs detailed 3D reconstruction of body and clothes. In contrast to existing learning-based approaches that require training specific models for each type of garment, SMPLicit can represent in a unified manner different garment topologies (e.g. from sleeveless tops to hoodies and open jackets), while controlling other properties like garment size or tightness/looseness. LayerNet follows a coarse-to-fine multi-stage strategy by first predicting smooth cloth geometries from SMPLicit, which are then refined by an image-guided displacement network that gracefully fits the body recovering high-frequency details and wrinkles. LayerNet achieves competitive accuracy in the task of 3D reconstruction against current 'garment-agnostic' state of the art for images of people in up-right positions and controlled environments, and consistently surpasses these methods on challenging body poses and uncontrolled settings. Furthermore, the semantically rich outcome of our approach is suitable for performing Virtual Try-on tasks directly on 3D, a task which, so far, has only been addressed in the 2D domain.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4009-4022, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34191722

RESUMO

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness. This work is an extended version of [1] where we provide more detailed explanations, comparisons and results as well as applications.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6794-6806, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33031034

RESUMO

We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. We propose a new encoder-decoder architecture with a novel multi-branch decoder designed specifically to account for the varying uncertainty in 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. To tackle the severe lack of labelled training data for egocentric 3D pose estimation we also introduced a large-scale photo-realistic synthetic dataset. xR-EgoPose offers 383K frames of high quality renderings of people with diverse skin tones, body shapes and clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2523-2539, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-31329106

RESUMO

We propose DoubleFusion, a new real-time system that combines volumetric non-rigid reconstruction with data-driven template fitting to simultaneously reconstruct detailed surface geometry, large non-rigid motion and the optimized human body shape from a single depth camera. One of the key contributions of this method is a double-layer representation consisting of a complete parametric body model inside, and a gradually fused detailed surface outside. A pre-defined node graph on the body parameterizes the non-rigid deformations near the body, and a free-form dynamically changing graph parameterizes the outer surface layer far from the body, which allows more general reconstruction. We further propose a joint motion tracking method based on the double-layer representation to enable robust and fast motion tracking performance. Moreover, the inner parametric body is optimized online and forced to fit inside the outer surface layer as well as the live depth input. Overall, our method enables increasingly denoised, detailed and complete surface reconstructions, fast motion tracking performance and plausible inner body shape reconstruction in real-time. Experiments and comparisons show improved fast motion tracking and loop closure performance on more challenging scenarios. Two extended applications including body measurement and shape retargeting show the potential of our system in terms of practical use.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Somatotipos/fisiologia , Algoritmos , Humanos , Imageamento Tridimensional , Aprendizado de Máquina , Postura/fisiologia , Gravação em Vídeo
5.
IEEE Trans Pattern Anal Mach Intell ; 38(8): 1533-47, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26829774

RESUMO

In this work, we present an approach to fuse video with sparse orientation data obtained from inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.


Assuntos
Algoritmos , Movimento (Física) , Humanos , Postura , Gravação em Vídeo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...