RESUMO
Although the performance of 3D human pose and shape estimation methods has improved considerably in recent years, existing approaches typically generate 3D poses defined in a camera or human-centered coordinate system. This makes it difficult to estimate a person's pure pose and motion in a world coordinate system for a video captured using a moving camera. To address this issue, this paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system. The core idea of the proposed approach is to estimate the difference between two adjacent global poses (i.e., global motion) that is invariant to selecting the coordinate system, instead of the global pose coupled to the camera motion. To this end, we propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence consisting of relative rotations of joints called global motion regressor (GMR). We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation. We conduct extensive experiments and prove the effectiveness of the proposed method empirically.
Assuntos
Algoritmos , Humanos , Movimento (Física)RESUMO
In this paper, we address the problem of 3D human mesh reconstruction from a single 2D human pose based on deep learning. We propose MeshLifter, a network that estimates a 3D human mesh from an input 2D human pose. Unlike most existing 3D human mesh reconstruction studies that train models using paired 2D and 3D data, we propose a weakly supervised learning method based on a loop structure to train the MeshLifter. The proposed method alleviates the difficulty of obtaining ground-truth 3D data to ensure that the MeshLifter can be trained successfully from a 2D human pose dataset and an unpaired 3D motion capture dataset. We compare the proposed method with recent state-of-the-art studies through various experiments and show that the proposed method achieves effective 3D human mesh reconstruction performance. Notably, our proposed method achieves a reconstruction error of 59.1 mm without using the 3D ground-truth data of Human3.6M, the standard dataset for 3D human mesh reconstruction.