Search | VHL Regional Portal

Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency.

Tulsiani, Shubham; Zhou, Tinghui; Efros, Alexei A; Malik, Jitendra.

IEEE Trans Pattern Anal Mach Intell ; 44(12): 8754-8765, 2022 Dec.

Article in English | MEDLINE | ID: mdl-30762530

ABSTRACT

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g., foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.

Learning Category-Specific Deformable 3D Models for Object Reconstruction.

Tulsiani, Shubham; Kar, Abhishek; Carreira, Joao; Malik, Jitendra.

IEEE Trans Pattern Anal Mach Intell ; 39(4): 719-731, 2017 04.

Article in English | MEDLINE | ID: mdl-27254860

ABSTRACT

We address the problem of fully automatic object localization and reconstruction from a single image. This is both a very challenging and very important problem which has, until recently, received limited attention due to difficulties in segmenting objects and predicting their poses. Here we leverage recent advances in learning convolutional networks for object detection and segmentation and introduce a complementary network for the task of camera viewpoint prediction. These predictors are very powerful, but still not perfect given the stringent requirements of shape reconstruction. Our main contribution is a new class of deformable 3D models that can be robustly fitted to images based on noisy pose and silhouette estimates computed upstream and that can be learned directly from 2D annotations available in object detection datasets. Our models capture top-down information about the main global modes of shape variation within a class providing a "low-frequency" shape. In order to capture fine instance-specific shape details, we fuse it with a high-frequency component recovered from shading cues. A comprehensive quantitative analysis and ablation study on the PASCAL 3D+ dataset validates the approach as we show fully automatic reconstructions on PASCAL VOC as well as large improvements on the task of viewpoint prediction.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL