Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37669192

RESUMO

Structure from Motion (SfM) is a fundamental computer vision problem which has not been well handled by deep learning. One of the promising solutions is to apply explicit structural constraint, e.g. 3D cost volume, into the neural network. Obtaining accurate camera pose from images alone can be challenging, especially with complicate environmental factors. Existing methods usually assume accurate camera poses from GT or other methods, which is unrealistic in practice and additional sensors are needed. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment, which consists of two cost volume based architectures to iteratively refine depth and pose. The explicit constraints on both depth and pose, when combined with the learning components, bring the merit from both traditional BA and emerging deep learning technology. To speed up the learning and inference efficiency, we apply the Gated Recurrent Units (GRUs)-based depth and pose update modules with coarse to fine cost volumes on the iterative refinements. In addition, with the extended residual depth prediction module, our model can be adapted to dynamic scenes effectively. Extensive experiments on various datasets show that our model achieves the state-of-the-art performance with superior robustness against challenging inputs.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2166-2180, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35471867

RESUMO

We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. While many previous works learn to hallucinate the shape directly from priors, we adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Instead of building a direct mapping function from images to 3D shape, our model learns to predict series of deformations to improve a coarse shape iteratively. Inspired by traditional multiple view geometry methods, our network samples nearby area around the initial mesh's vertex locations and reasons an optimal deformation using perceptual feature statistics built from multiple input images. Extensive experiments show that our model produces accurate 3D shapes that are not only visually plausible from the input perspectives, but also well aligned to arbitrary viewpoints. With the help of physically driven architecture, our model also exhibits generalization capability across different semantic categories, and the number of input images. Model analysis experiments show that our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable renderer for test-time optimization.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(1): 1-16, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31331880

RESUMO

Many real-world video sequences cannot be conveniently categorized as general or degenerate; in such cases, imposing a false dichotomy in using the fundamental matrix or homography model for motion segmentation on video sequences would lead to difficulty. Even when we are confronted with a general scene-motion, the fundamental matrix approach as a model for motion segmentation still suffers from several defects, which we discuss in this paper. The full potential of the fundamental matrix approach could only be realized if we judiciously harness information from the simpler homography model. From these considerations, we propose a multi-model spectral clustering framework that synergistically combines multiple models (homography and fundamental matrix) together. We show that the performance can be substantially improved in this way. For general motion segmentation tasks, the number of independently moving objects is often unknown a priori and needs to be estimated from the observations. This is referred to as model selection and it is essentially still an open research problem. In this work, we propose a set of model selection criteria balancing data fidelity and model complexity. We perform extensive testing on existing motion segmentation datasets with both segmentation and model selection tasks, achieving state-of-the-art performance on all of them; we also put forth a more realistic and challenging dataset adapted from the KITTI benchmark, containing real-world effects such as strong perspectives and strong forward translations not seen in the traditional datasets.

4.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3600-3613, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-32248097

RESUMO

In this paper, we propose an end-to-end deep learning architecture that generates 3D triangular meshes from single color images. Restricted by the nature of prevalent deep learning techniques, the majority of previous works represent 3D shapes in volumes or point clouds. However, it is non-trivial to convert these representations to compact and ready-to-use mesh models. Unlike the existing methods, our network represents 3D shapes in meshes, which are essentially graphs and well suited for graph-based convolutional neural networks. Leveraging perceptual features extracted from an input image, our network produces the correct geometry by progressively deforming an ellipsoid. To make the whole deformation procedure stable, we adopt a coarse-to-fine strategy, and define various mesh/surface related losses to capture properties of various aspects, which benefits producing the visually appealing and physically accurate 3D geometry. In addition, our model by nature can be adapted to objects in specific domains, e.g., human faces, and be easily extended to learn per-vertex properties, e.g., color. Extensive experiments show that our method not only qualitatively produces the mesh model with better details, but also achieves the higher 3D shape estimation accuracy compared against the state-of-the-arts.

5.
Artigo em Inglês | MEDLINE | ID: mdl-32310770

RESUMO

Physically based rendering has been widely used to generate photo-realistic images, which greatly impacts industry by providing appealing rendering, such as for entertainment and augmented reality, and academia by serving large scale high-fidelity synthetic training data for data hungry methods like deep learning. However, physically based rendering heavily relies on ray-tracing, which can be computational expensive in complicated environment and hard to parallelize. In this paper, we propose an end-to-end deep learning based approach to generate physically based rendering efficiently. Our system consists of two stacked neural networks, which effectively simulates the physical behavior of the rendering process and produces photo-realistic images. The first network, namely shading network, is designed to predict the optimal shading image from surface normal, depth and illumination; the second network, namely composition network, learns to combine the predicted shading image with the reflectance to generate the final result. Our approach is inspired by intrinsic image decomposition, and thus it is more physically reasonable to have shading as intermediate supervision. Extensive experiments show that our approach is robust to noise thanks to a modified perceptual loss and even outperforms the physically based rendering systems in complex scenes given a reasonable time budget.

6.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1964-1978, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28809676

RESUMO

While clustering has been well studied in the past decade, model selection has drawn much less attention due to the difficulty of the problem. In this paper, we address both problems in a joint manner by recovering an ideal affinity tensor from an imperfect input. By taking into account the relationship of the affinities induced by the cluster structures, we are able to significantly improve the affinity input, such as repairing those entries corrupted by gross outliers. More importantly, the recovered ideal affinity tensor also directly indicates the number of clusters and their membership, thus solving the model selection and clustering jointly. To enforce the requisite global consistency in the affinities demanded by the cluster structure, we impose a number of constraints, specifically, among others, the tensor should be low rank and sparse, and it should obey what we call the rank-1 sum constraint. To solve this highly non-smooth and non-convex problem, we exploit the mathematical structures, and express the original problem in an equivalent form amenable for numerical optimization and convergence analysis. To scale to large problem sizes, we also propose an alternative formulation, so that those problems can be efficiently solved via stochastic optimization in an online fashion. We evaluate our algorithm with different applications to demonstrate its superiority, and show it can adapt to a large variety of settings.

7.
IEEE Trans Vis Comput Graph ; 18(2): 177-87, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21282860

RESUMO

We present a two-level approach for height map estimation from single images, aiming at restoring brick and stone relief(BSR) from their rubbing images in a visually plausible manner. In our approach, the base relief of the low frequency component is estimated automatically with a partial differential equation (PDE)-based mesh deformation scheme. A few vertices near the central area of the object region are selected and assigned with heights estimated by an erosion-based contour map. These vertices together with object boundary vertices, boundary normals as well as the partial differential properties of the mesh are taken as constraints to deform the mesh by minimizing a least-squares error functional. The high frequency detail is estimated directly from rubbing images automatically or optionally with minimal interactive processing. The final height map for a restored BSR is obtained by blending height maps of the base relief and high frequency detail. We demonstrate that our method can not only successfully restore several BSR maps from their rubbing images, but also restore some relief-like surfaces from photographic images.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...