Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 793-804, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37844002

RESUMEN

This paper presents a method to reconstruct high-quality textured 3D models from single images. Current methods rely on datasets with expensive annotations; multi-view images and their camera parameters. Our method relies on GAN generated multi-view image datasets which have a negligible annotation cost. However, they are not strictly multi-view consistent and sometimes GANs output distorted images. This results in degraded reconstruction qualities. In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses and 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details. Our work provides a bridge from 2D supervisions of GAN models to 3D reconstruction models and removes the expensive annotation efforts. We show significant improvements over previous methods whether they were trained on GAN generated multi-view images or on real images with expensive annotations.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14563-14574, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37751344

RESUMEN

This paper presents a method to achieve fine detailed texture learning for 3D models that are reconstructed from both multi-view and single-view images. The framework is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network. The contributions of the paper are in the generative learning pipeline where we propose two improvements. First, since the learned textures should be spatially aligned, we propose an attention mechanism that relies on the learnable positions of pixels. Second, since discriminator receives aligned texture maps, we augment its input with a learnable embedding which improves the feedback to the generator. We achieve significant improvements on multi-view sequences from Tripod dataset as well as on single-view image datasets, Pascal 3D+ and CUB. We demonstrate that our method achieves superior 3D textured models compared to the previous works.

3.
Artículo en Inglés | MEDLINE | ID: mdl-37721888

RESUMEN

This article presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions as well as out-of-domain image collections, e.g., images captured by a different set-up than the training dataset. The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applications, and an extensively studied topic of domain adaptation. These presented robustness and generalization evaluations are important when designing instance segmentation models for real-world applications and picking an off-the-shelf pretrained model to directly use for the task at hand. Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch versus pretrained networks, and the effect of multitask training on robustness and generalization. Through this study, we gain several insights. For example, we find that group normalization (GN) enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top. On the other hand, batch normalization (BN) improves the generalization of the models across different datasets where statistics of image features change. We also find that single-stage detectors do not generalize well to larger image resolutions than their training size. On the other hand, multistage detectors can easily be used on images of different sizes. We hope that our comprehensive study will motivate the development of more robust and reliable instance segmentation models.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14777-14788, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37616132

RESUMEN

We propose an image-to-image translation framework for facial attribute editing with disentangled interpretable latent directions. Facial attribute editing task faces the challenges of targeted attribute editing with controllable strength and disentanglement in the representations of attributes to preserve the other attributes during edits. For this goal, inspired by the latent space factorization works of fixed pretrained GANs, we design the attribute editing by latent space factorization, and for each attribute, we learn a linear direction that is orthogonal to the others. We train these directions with orthogonality constraints and disentanglement losses. To project images to semantically organized latent spaces, we set an encoder-decoder architecture with attention-based skip connections. We extensively compare with previous image translation algorithms and editing with pretrained GAN works. Our extensive experiments show that our method significantly improves over the state-of-the-arts.

5.
IEEE Trans Neural Netw Learn Syst ; 34(12): 10737-10746, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35536806

RESUMEN

We analyze why the orthogonality penalty improves quantization in deep neural networks. Using results from perturbation theory as well as through extensive experiments with Resnet50, Resnet101, and VGG19 models, we mathematically and experimentally show that improved quantization accuracy resulting from orthogonality constraint stems primarily from reduced condition numbers, which is the ratio of largest to smallest singular values of weight matrices, more so than reduced spectral norms, in contrast to the explanations in previous literature. We also show that the orthogonality penalty improves quantization even in the presence of a state-of-the-art quantized retraining method. Our results show that, when the orthogonality penalty is used with quantized retraining, ImageNet Top5 accuracy loss from 4- to 8-bit quantization is reduced by up to 7% for Resnet50, and up to 10% for Resnet101, compared to quantized retraining with no orthogonality penalty.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6096-6110, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36155473

RESUMEN

Partial convolution weights convolutions with binary masks and renormalizes on valid pixels. It was originally proposed for image inpainting task because a corrupted image processed by a standard convolutional often leads to artifacts. Therefore, binary masks are constructed that define the valid and corrupted pixels, so that partial convolution results are only calculated based on valid pixels. It has been also used for conditional image synthesis task, so that when a scene is generated, convolution results of an instance depend only on the feature values that belong to the same instance. One of the unexplored applications for partial convolution is padding which is a critical component of modern convolutional networks. Common padding schemes make strong assumptions about how the padded data should be extrapolated. We show that these padding schemes impair model accuracy, whereas partial convolution based padding provides consistent improvements across a range of tasks. In this article, we review partial convolution applications under one framework. We conduct a comprehensive study of the partial convolution based padding on a variety of computer vision tasks, including image classification, 3D-convolution-based action recognition, and semantic segmentation. Our results suggest that partial convolution-based padding shows promising improvements over strong baselines.

7.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3883-3894, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33513098

RESUMEN

Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. Using a motion-based foreground assumption, this work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions in an unsupervised way, allowing the model to condition only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest when measured against ground-truth foreground masks. Furthermore, the rendered background quality is also improved as ill-suited landmarks are no longer forced to model this content. We demonstrate this improvement via improved image fidelity in a video-prediction task. Code is available at https://github.com/NVIDIA/UnsupervisedLandmarkLearning.

8.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2360-2372, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31995476

RESUMEN

Generating computer graphics (CG) rendered synthetic images has been widely used to create simulation environments for robotics/autonomous driving and generate labeled data. Yet, the problem of training models purely with synthetic data remains challenging due to the considerable domain gaps caused by current limitations on rendering. In this paper, we propose a simple yet effective domain adaptation framework towards closing such gap at image level. Unlike many GAN-based approaches, our method aims to match the covariance of the universal feature embeddings across domains, making the adaptation a fast, convenient step and avoiding the need for potentially difficult GAN training. To align domains more precisely, we further propose a conditional covariance matching framework which iteratively estimates semantic segmentation regions and conditionally matches the class-wise feature covariance given the segmentation regions. We demonstrate that both tasks can mutually refine and considerably improve each other, leading to state-of-the-art domain adaptation results. Extensive experiments under multiple synthetic-to-real settings show that our approach exceeds the performance of latest domain adaptation approaches. In addition, we offer a quantitative analysis where our framework shows considerable reduction in Frechet Inception distance between source and target domains, demonstrating the effectiveness of this work in bridging the synthetic-to-real domain gap.

9.
IEEE Trans Neural Netw Learn Syst ; 28(7): 1572-1583, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-27071200

RESUMEN

Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA