Pesquisa | Portal Regional da BVS

Self-Supervised Multi-View Person Association and its Applications.

Vo, Minh; Yumer, Ersin; Sunkavalli, Kalyan; Hadap, Sunil; Sheikh, Yaser; Narasimhan, Srinivasa G.

IEEE Trans Pattern Anal Mach Intell ; 43(8): 2794-2808, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-32086193

RESUMO

Reliable markerless motion tracking of people participating in a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework to adapt a generic person appearance descriptor to the unlabeled videos by exploiting motion tracking, mutual exclusion constraints, and multi-view geometry. The adapted discriminative descriptor is used in a tracking-by-clustering formulation. We validate the effectiveness of our descriptor learning on WILDTRACK T. Chavdarova et al., "WILDTRACK: A multi-camera HD dataset for dense unscripted pedestrian detection," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5030-5039. and three new complex social scenes captured by multiple cameras with up to 60 people "in the wild". We report significant improvement in association accuracy (up to 18 percent) and stable and coherent 3D human skeleton tracking (5 to 10 times) over the baseline. Using the reconstructed 3D skeletons, we cut the input videos into a multi-angle video where the image of a specified person is shown from the best visible front-facing camera. Our algorithm detects inter-human occlusion to determine the camera switching moment while still maintaining the flow of the action well. Website: http://www.cs.cmu.edu/~ILIM/projects/IM/Association4Tracking.

Assuntos

Algoritmos , Relações Interpessoais , Humanos , Movimento (Física)

Shape Estimation from Shading, Defocus, and Correspondence Using Light-Field Angular Coherence.

Tao, Michael W; Srinivasan, Pratul P; Hadap, Sunil; Rusinkiewicz, Szymon; Malik, Jitendra; Ramamoorthi, Ravi.

IEEE Trans Pattern Anal Mach Intell ; 39(3): 546-560, 2017 03.

Artigo em Inglês | MEDLINE | ID: mdl-27101598

RESUMO

Light-field cameras are quickly becoming commodity items, with consumer and industrial applications. They capture many nearby views simultaneously using a single image with a micro-lens array, thereby providing a wealth of cues for depth recovery: defocus, correspondence, and shading. In particular, apart from conventional image shading, one can refocus images after acquisition, and shift one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. We present a principled algorithm for dense depth estimation that combines defocus and correspondence metrics. We then extend our analysis to the additional cue of shading, using it to refine fine details in the shape. By exploiting an all-in-focus image, in which pixels are expected to exhibit angular coherence, we define an optimization framework that integrates photo consistency, depth consistency, and shading consistency. We show that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA