Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 32: 5465-5477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37773909

RESUMO

Context modeling or multi-level feature fusion methods have been proved to be effective in improving semantic segmentation performance. However, they are not specialized to deal with the problems of pixel-context mismatch and spatial feature misalignment, and the high computational complexity hinders their widespread application in real-time scenarios. In this work, we propose a lightweight Context and Spatial Feature Calibration Network (CSFCN) to address the above issues with pooling-based and sampling-based attention mechanisms. CSFCN contains two core modules: Context Feature Calibration (CFC) module and Spatial Feature Calibration (SFC) module. CFC adopts a cascaded pyramid pooling module to efficiently capture nested contexts, and then aggregates private contexts for each pixel based on pixel-context similarity to realize context feature calibration. SFC splits features into multiple groups of sub-features along the channel dimension and propagates sub-features therein by the learnable sampling to achieve spatial feature calibration. Extensive experiments on the Cityscapes and CamVid datasets illustrate that our method achieves a state-of-the-art trade-off between speed and accuracy. Concretely, our method achieves 78.7% mIoU with 70.0 FPS and 77.8% mIoU with 179.2 FPS on the Cityscapes and CamVid test sets, respectively. The code is available at https://nave.vr3i.com/ and https://github.com/kaigelee/CSFCN.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4291-4305, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33687835

RESUMO

Part information has been proven to be resistant to occlusions and viewpoint changes, which are main difficulties in car parsing and reconstruction. However, in the absence of datasets and approaches incorporating car parts, there are limited works that benefit from it. In this paper, we propose the first part-aware approach for joint part-level car parsing and reconstruction in single street view images. Without labor-intensive part annotations on real images, our approach simultaneously estimates pose, shape, and semantic parts of cars. There are two contributions in this paper. First, our network introduces dense part information to facilitate pose and shape estimation, which is further optimized with a novel 3D loss. To obtain part information in real images, a class-consistent method is introduced to implicitly transfer part knowledge from synthesized images. Second, we construct the first high-quality dataset containing 348 car models with physical dimensions and part annotations. Given these models, 60K synthesized images with randomized configurations are generated. Experimental results demonstrate that part knowledge can be effectively transferred with our class-consistent method, which significantly improves part segmentation performance on real street views. By fusing dense part information, our pose and shape estimation results achieve the state-of-the-art performance on the ApolloCar3D and outperform previous approaches by large margins in terms of both A3DP-Abs and A3DP-Rel.


Assuntos
Algoritmos , Automóveis
3.
IEEE Trans Image Process ; 30: 2436-2449, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33417546

RESUMO

Semantic segmentation is a challenging task that needs to handle large scale variations, deformations, and different viewpoints. In this paper, we develop a novel network named Gated Path Selection Network (GPSNet), which aims to adaptively select receptive fields while maintaining the dense sampling capability. In GPSNet, we first design a two-dimensional SuperNet, which densely incorporates features from growing receptive fields. And then, a Comparative Feature Aggregation (CFA) module is introduced to dynamically aggregate discriminative semantic context. In contrast to previous works that focus on optimizing sparse sampling locations on regular grids, GPSNet can adaptively harvest free form dense semantic context information. The derived adaptive receptive fields and dense sampling locations are data-dependent and flexible which can model various contexts of objects. On two representative semantic segmentation datasets, i.e., Cityscapes and ADE20K, we show that the proposed approach consistently outperforms previous methods without bells and whistles.

4.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2702-2719, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-31283496

RESUMO

Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g., KITTI [2] or Cityscapes [3] , ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as joint 3D-2D segment labeling, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...