Pesquisa | Portal Regional da BVS

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering.

Ke, Lipeng; Chang, Ming-Ching; Qi, Honggang; Lyu, Siwei.

IEEE Trans Image Process ; 31: 2782-2795, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35344493

RESUMO

Human detection and pose estimation are essential for understanding human activities in images and videos. Mainstream multi-human pose estimation methods take a top-down approach, where human detection is first performed, then each detected person bounding box is fed into a pose estimation network. This top-down approach suffers from the early commitment of initial detections in crowded scenes and other cases with ambiguities or occlusions, leading to pose estimation failures. We propose the DetPoseNet, an end-to-end multi-human detection and pose estimation framework in a unified three-stage network. Our method consists of a coarse-pose proposal extraction sub-net, a coarse-pose based proposal filtering module, and a multi-scale pose refinement sub-net. The coarse-pose proposal sub-net extracts whole-body bounding boxes and body keypoint proposals in a single shot. The coarse-pose filtering step based on the person and keypoint proposals can effectively rule out unlikely detections, thus improving subsequent processing. The pose refinement sub-net performs cascaded pose estimation on each refined proposal region. Multi-scale supervision and multi-scale regression are used in the pose refinement sub-net to simultaneously strengthen context feature learning. Structure-aware loss and keypoint masking are applied to further improve the pose refinement robustness. Our framework is flexible to accept most existing top-down pose estimators as the role of the pose refinement sub-net in our approach. Experiments on COCO and OCHuman datasets demonstrate the effectiveness of the proposed framework. The proposed method is computationally efficient (5-6x speedup) in estimating multi-person poses with refined bounding boxes in sub-seconds.

Fast Online Video Pose Estimation by Dynamic Bayesian Modeling of Mode Transitions.

Chang, Ming-Ching; Ke, Lipeng; Qi, Honggang; Wen, Longyin; Lyu, Siwei.

IEEE Trans Cybern ; 51(1): 2-15, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-31880574

RESUMO

We propose a fast online video pose estimation method to detect and track human upper-body poses based on a conditional dynamic Bayesian modeling of pose modes without referring to future frames. The estimation of human body poses from videos is an important task with many applications. Our method extends fast image-based pose estimation to live video streams by leveraging the temporal correlation of articulated poses between frames. Video pose estimation is inferred over a time window using a conditional dynamic Bayesian network (CDBN), which we term time-windowed CDBN. Specifically, latent pose modes and their transitions are modeled and co-determined from the combination of three modules: 1) inference based on current observations; 2) the modeling of mode-to-mode transitions as a probabilistic prior; and 3) the modeling of state-to-mode transitions using a multimode softmax regression. Given the predicted pose modes, the body poses in terms of arm joint locations can then be determined more accurately and robustly. Our method is suitable to investigate high frame rate (HFR) scenarios, where pose mode transitions can effectively capture action-related temporal information to boost performance. We evaluate our method on a newly collected HFR-Pose dataset and four major video pose datasets (VideoPose2, TUM Kitchen, FLIC, and Penn_Action). Our method achieves improvements in both accuracy and efficiency over existing online video pose estimation methods.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA