Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 1066-1080, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-32750836

RESUMO

Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly optimizes four coupled sub-problems: estimating camera intrinsics and extrinsics, triangulating static 3D points, as well as sub-frame temporal alignment between cameras and computing 3D trajectories of dynamic points. Key to our joint optimization is the careful integration of physics-based motion priors within the reconstruction pipeline, validated on a large motion capture corpus of human subjects. We devise an incremental reconstruction and alignment algorithm to strictly enforce the motion prior during the spatiotemporal bundle adjustment. This algorithm is further made more efficient by a divide and conquer scheme while still maintaining high accuracy. We apply this algorithm to reconstruct 3D motion trajectories of human bodies in dynamic events captured by multiple uncalibrated and unsynchronized video cameras in the wild. To make the reconstruction visually more interpretable, we fit a statistical 3D human body model to the asynchronous video streams. Compared to the baseline, the fitting significantly benefits from the proposed spatiotemporal bundle adjustment procedure. Because the videos are aligned with sub-frame precision, we reconstruct 3D motion at much higher temporal resolution than the input videos. Website: http://www.cs.cmu.edu/~ILIM/projects/IM/STBA.


Assuntos
Algoritmos , Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos , Movimento (Física)
2.
IEEE Trans Pattern Anal Mach Intell ; 43(1): 172-186, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-31331883

RESUMO

Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2794-2808, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32086193

RESUMO

Reliable markerless motion tracking of people participating in a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework to adapt a generic person appearance descriptor to the unlabeled videos by exploiting motion tracking, mutual exclusion constraints, and multi-view geometry. The adapted discriminative descriptor is used in a tracking-by-clustering formulation. We validate the effectiveness of our descriptor learning on WILDTRACK T. Chavdarova et al., "WILDTRACK: A multi-camera HD dataset for dense unscripted pedestrian detection," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5030-5039. and three new complex social scenes captured by multiple cameras with up to 60 people "in the wild". We report significant improvement in association accuracy (up to 18 percent) and stable and coherent 3D human skeleton tracking (5 to 10 times) over the baseline. Using the reconstructed 3D skeletons, we cut the input videos into a multi-angle video where the image of a specified person is shown from the best visible front-facing camera. Our algorithm detects inter-human occlusion to determine the camera switching moment while still maintaining the flow of the action well. Website: http://www.cs.cmu.edu/~ILIM/projects/IM/Association4Tracking.


Assuntos
Algoritmos , Relações Interpessoais , Humanos , Movimento (Física)
4.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3681-3694, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-32248096

RESUMO

We present supervision by registration and triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (I) The detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (II) The detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data during detector training. End-to-end training is made possible by differentiable registration and 3D triangulation modules. Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video.

5.
IEEE Trans Pattern Anal Mach Intell ; 41(1): 190-204, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29990012

RESUMO

We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime the nature of interactions. The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the integration of perceptual analyses over a large variety of view points. We present a modularized system designed around this principle, consisting of integrated structural, hardware, and software innovations. The system takes, as input, 480 synchronized video streams of multiple people engaged in social activities, and produces, as output, the labeled time-varying 3D structure of anatomical landmarks on individuals in the space. Our algorithm is designed to fuse the "weak" perceptual processes in the large number of views by progressively generating skeletal proposals from low-level appearance cues, and a framework for temporal refinement is also presented by associating body parts to reconstructed dense 3D trajectory stream. Our system and method are the first in reconstructing full body motion of more than five people engaged in social interactions without using markers. We also empirically demonstrate the impact of the number of views in achieving this goal.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Relações Interpessoais , Gravação em Vídeo , Algoritmos , Desenho de Equipamento , Humanos , Postura/fisiologia , Gravação em Vídeo/instrumentação , Gravação em Vídeo/métodos
6.
IEEE Trans Pattern Anal Mach Intell ; 39(11): 2201-2214, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-27992328

RESUMO

Recovering dynamic 3D structures from 2D image observations is highly under-constrained because of projection and missing data, motivating the use of strong priors to constrain shape deformation. In this paper, we empirically show that the spatiotemporal covariance of natural deformations is dominated by a Kronecker pattern. We demonstrate that this pattern arises as the limit of a spatiotemporal autoregressive process, and derive a Kronecker Markov Random Field as a prior distribution over dynamic structures. This distribution unifies shape and trajectory models of prior art and has the individual models as its marginals. The key assumption of the Kronecker MRF is that the spatiotemporal covariance is separable into the product of a temporal and a shape covariance, and can therefore be modeled using the matrix normal distribution. Analysis on motion capture data validates that this distribution is an accurate approximation with significantly fewer free parameters. Using the trace-norm, we present a convex method to estimate missing data from a single sequence when the marginal shape distribution is unknown. The Kronecker-Markov distribution, fit to a single sequence, outperforms state-of-the-art methods at inferring missing 3D data, and additionally provides covariance estimates of the uncertainty.

7.
IEEE Comput Graph Appl ; 36(4): 34-45, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27514031

RESUMO

Comic art consists of a sequence of panels of different shapes and sizes that visually communicate the narrative to the reader. The move-on-stills technique allows such still images to be retargeted for digital displays via camera moves. Today, moves-on-stills can be created by software applications given user-provided parameters for each desired camera move. The proposed algorithm uses viewer gaze as input to computationally predict camera move parameters. The authors demonstrate their algorithm on various comic book panels and evaluate its performance by comparing their results with a professional DVD.

8.
IEEE Trans Pattern Anal Mach Intell ; 38(2): 390-404, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26761742

RESUMO

Active illumination based methods have a trade-off between acquisition time and resolution of the estimated 3D shapes. Multi-shot approaches can generate dense reconstructions but require stationary scenes. Single-shot methods are applicable to dynamic objects but can only estimate sparse reconstructions and are sensitive to surface texture. We present a single-shot approach to produce dense shape reconstructions of highly textured objects illuminated by one or more projectors. The key to our approach is an image decomposition scheme that can recover the illumination image of different projectors and the texture images of the scene from their mixed appearances. We focus on three cases of mixed appearances: the illumination from one projector onto textured surface, illumination from multiple projectors onto a textureless surface, or their combined effect. Our method can accurately compute per-pixel warps from the illumination patterns and the texture template to the observed image. The texture template is obtained by interleaving the projection sequence with an all-white pattern. The estimated warps are reliable even with infrequent interleaved projection and strong object deformation. Thus, we obtain detailed shape reconstruction and dense motion tracking of the textured surfaces. The proposed method, implemented using a one camera and two projectors system, is validated on synthetic and real data containing subtle non-rigid surface deformations.

9.
IEEE Trans Pattern Anal Mach Intell ; 33(4): 780-93, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21330689

RESUMO

In this paper, we describe the explicit application of articulation constraints for estimating the motion of a system of articulated planes. We relate articulations to the relative homography between planes and show that these articulations translate into linearized equality constraints on a linear least-squares system, which can be solved efficiently using a Karush-Kuhn-Tucker system. The articulation constraints can be applied for both gradient-based and feature-based motion estimation algorithms and to illustrate this, we describe a gradient-based motion estimation algorithm for an affine camera and a feature-based motion estimation algorithm for a projective camera that explicitly enforces articulation constraints. We show that explicit application of articulation constraints leads to numerically stable estimates of motion. The simultaneous computation of motion estimates for all of the articulated planes in a scene allows us to handle scene areas where there is limited texture information and areas that leave the field of view. Our results demonstrate the wide applicability of the algorithm in a variety of challenging real-world cases such as human body tracking, motion estimation of rigid, piecewise planar scenes, and motion estimation of triangulated meshes.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Mãos/anatomia & histologia , Mãos/fisiologia , Humanos , Aumento da Imagem/métodos , Imageamento Tridimensional , Movimento (Física) , Movimento
10.
IEEE Trans Pattern Anal Mach Intell ; 33(7): 1442-56, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21079275

RESUMO

Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes. These bases are object dependent and therefore have to be estimated anew for each video sequence. In contrast, we propose a dual approach to describe the evolving 3D structure in trajectory space by a linear combination of basis trajectories. We describe the dual relationship between the two approaches, showing that they both have equal power for representing 3D structure. We further show that the temporal smoothness in 3D trajectories alone can be used for recovering nonrigid structure from a moving camera. The principal advantage of expressing deforming 3D structure in trajectory space is that we can define an object independent basis. This results in a significant reduction in unknowns and corresponding stability in estimation. We propose the use of the Discrete Cosine Transform (DCT) as the object independent basis and empirically demonstrate that it approaches Principal Component Analysis (PCA) for natural motions. We report the performance of the proposed method, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions, including piecewise rigid motion, partially nonrigid motion (such as a facial expressions), and highly nonrigid motion (such as a person walking or dancing).


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Imagem Corporal Total/métodos , Algoritmos , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Técnica de Subtração
11.
IEEE Trans Pattern Anal Mach Intell ; 30(2): 361-7, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18084066

RESUMO

A camera mounted on an aerial vehicle provides an excellent means for monitoring large areas of a scene. Utilizing several such cameras on different aerial vehicles allows further flexibility, in terms of increased visual scope and in the pursuit of multiple targets. In this paper, we address the problem of associating objects across multiple airborne cameras. Since the cameras are moving and often widely separated, direct appearance-based or proximity-based constraints cannot be used. Instead, we exploit geometric constraints on the relationship between the motion of each object across cameras, to test multiple association hypotheses, without assuming any prior calibration information. Given our scene model, we propose a likelihood function for evaluating a hypothesized association between observations in multiple cameras that is geometrically motivated. Since multiple cameras exist, ensuring coherency in association is an essential requirement, e.g. that transitive closure is maintained between more than two cameras. To ensure such coherency we pose the problem of maximizing the likelihood function as a k-dimensional matching and use an approximation to find the optimal assignment of association. Using the proposed error function, canonical trajectories of each object and optimal estimates of inter-camera transformations (in a maximum likelihood sense) are computed. Finally, we show that as a result of associating objects across the cameras, a concurrent visualization of multiple aerial video streams is possible and that, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models, and through simulation quantitative performance is also reported.

12.
IEEE Trans Pattern Anal Mach Intell ; 27(11): 1778-92, 2005 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16285376

RESUMO

Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Gravação em Vídeo/métodos , Teorema de Bayes , Simulação por Computador , Aumento da Imagem/métodos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...