Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38039167

RESUMO

We present CRefNet, a hybrid transformer-convolutional deep neural network for consistent reflectance estimation in intrinsic image decomposition. Estimating consistent reflectance is particularly challenging when the same material appears differently due to changes in illumination. Our method achieves enhanced global reflectance consistency via a novel transformer module that converts image features to reflectance features. At the same time, this module also exploits long-range data interactions. We introduce reflectance reconstruction as a novel auxiliary task that shares a common decoder with the reflectance estimation task, and which substantially improves the quality of reconstructed reflectance maps. Finally, we improve local reflectance consistency via a new rectified gradient filter that effectively suppresses small variations in predictions without any overhead at inference time. Our experiments show that our contributions enable CRefNet to predict highly consistent reflectance maps and to outperform the state of the art by 10% WHDR.

2.
IEEE Trans Vis Comput Graph ; 26(12): 3457-3466, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32941145

RESUMO

Video portraits are common in a variety of applications, such as videoconferencing, news broadcasting, and virtual education and training. We present a novel method to synthesize photorealistic video portraits for an input portrait video, automatically driven by a person's voice. The main challenge in this task is the hallucination of plausible, photorealistic facial expressions from input speech audio. To address this challenge, we employ a parametric 3D face model represented by geometry, facial expression, illumination, etc., and learn a mapping from audio features to model parameters. The input source audio is first represented as a high-dimensional feature, which is used to predict facial expression parameters of the 3D face model. We then replace the expression parameters computed from the original target video with the predicted one, and rerender the reenacted face. Finally, we generate a photorealistic video portrait from the reenacted synthetic face sequence via a neural face renderer. One appealing feature of our approach is the generalization capability for various input speech audio, including synthetic speech audio from text-to-speech software. Extensive experimental results show that our approach outperforms previous general-purpose audio-driven video portrait methods. This includes a user study demonstrating that our results are rated as more realistic than previous methods.

3.
IEEE Trans Vis Comput Graph ; 25(5): 1828-1835, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30802864

RESUMO

The ubiquity of smart mobile devices, such as phones and tablets, enables users to casually capture 360° panoramas with a single camera sweep to share and relive experiences. However, panoramas lack motion parallax as they do not provide different views for different viewpoints. The motion parallax induced by translational head motion is a crucial depth cue in daily life. Alternatives, such as omnidirectional stereo panoramas, provide different views for each eye (binocular disparity), but they also lack motion parallax as the left and right eye panoramas are stitched statically. Methods based on explicit scene geometry reconstruct textured 3D geometry, which provides motion parallax, but suffers from visible reconstruction artefacts. The core of our method is a novel multi-perspective panorama representation, which can be casually captured and rendered with motion parallax for each eye on the fly. This provides a more realistic perception of panoramic environments which is particularly useful for virtual reality applications. Our approach uses a single consumer video camera to acquire 200-400 views of a real 360° environment with a single sweep. By using novel-view synthesis with flow-based blending, we show how to turn these input views into an enriched 360° panoramic experience that can be explored in real time, without relying on potentially unreliable reconstruction of scene geometry. We compare our results with existing omnidirectional stereo and image-based rendering methods to demonstrate the benefit of our approach, which is the first to enable casual consumers to capture and view high-quality 360° panoramas with motion parallax.

4.
IEEE Trans Vis Comput Graph ; 24(4): 1545-1553, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29543172

RESUMO

We propose a novel 360° scene representation for converting real scenes into stereoscopic 3D virtual reality content with head-motion parallax. Our image-based scene representation enables efficient synthesis of novel views with six degrees-of-freedom (6-DoF) by fusing motion fields at two scales: (1) disparity motion fields carry implicit depth information and are robustly estimated from multiple laterally displaced auxiliary viewpoints, and (2) pairwise motion fields enable real-time flow-based blending, which improves the visual fidelity of results by minimizing ghosting and view transition artifacts. Based on our scene representation, we present an end-to-end system that captures real scenes with a robotic camera arm, processes the recorded data, and finally renders the scene in a head-mounted display in real time (more than 40 Hz). Our approach is the first to support head-motion parallax when viewing real 360° scenes. We demonstrate compelling results that illustrate the enhanced visual experience - and hence sense of immersion-achieved with our approach compared to widely-used stereoscopic panoramas.


Assuntos
Percepção de Profundidade/fisiologia , Movimentos da Cabeça/fisiologia , Imageamento Tridimensional/métodos , Interface Usuário-Computador , Realidade Virtual , Humanos , Gravação em Vídeo
5.
IEEE Trans Vis Comput Graph ; 23(11): 2447-2454, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28809688

RESUMO

We present a novel real-time approach for user-guided intrinsic decomposition of static scenes captured by an RGB-D sensor. In the first step, we acquire a three-dimensional representation of the scene using a dense volumetric reconstruction framework. The obtained reconstruction serves as a proxy to densely fuse reflectance estimates and to store user-provided constraints in three-dimensional space. User constraints, in the form of constant shading and reflectance strokes, can be placed directly on the real-world geometry using an intuitive touch-based interaction metaphor, or using interactive mouse strokes. Fusing the decomposition results and constraints in three-dimensional space allows for robust propagation of this information to novel views by re-projection. We leverage this information to improve on the decomposition quality of existing intrinsic video decomposition techniques by further constraining the ill-posed decomposition problem. In addition to improved decomposition quality, we show a variety of live augmented reality applications such as recoloring of objects, relighting of scenes and editing of material appearance.


Assuntos
Imageamento Tridimensional/métodos , Gravação em Vídeo/métodos , Algoritmos , Humanos
6.
ACM Trans Graph ; 32(4)2013 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-24273376

RESUMO

Image-based rendering (IBR) creates realistic images by enriching simple geometries with photographs, e.g., mapping the photograph of a building façade onto a plane. However, as soon as the viewer moves away from the correct viewpoint, the image in the retina becomes distorted, sometimes leading to gross misperceptions of the original geometry. Two hypotheses from vision science state how viewers perceive such image distortions, one claiming that they can compensate for them (and therefore perceive scene geometry reasonably correctly), and one claiming that they cannot compensate (and therefore can perceive rather significant distortions). We modified the latter hypothesis so that it extends to street-level IBR. We then conducted a rigorous experiment that measured the magnitude of perceptual distortions that occur with IBR for façade viewing. We also conducted a rating experiment that assessed the acceptability of the distortions. The results of the two experiments were consistent with one another. They showed that viewers' percepts are indeed distorted, but not as severely as predicted by the modified vision science hypothesis. From our experimental results, we develop a predictive model of distortion for street-level IBR, which we use to provide guidelines for acceptability of virtual views and for capture camera density. We perform a confirmatory study to validate our predictions, and illustrate their use with an application that guides users in IBR navigation to stay in regions where virtual views yield acceptable perceptual distortions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...