Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Neural Netw Learn Syst ; 34(12): 9604-9624, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35482692

RESUMO

Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception, and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception, and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.

2.
IEEE Trans Neural Netw Learn Syst ; 33(5): 2023-2033, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-34347607

RESUMO

Deep learning-based methods mymargin have achieved remarkable performance in 3-D sensing since they perceive environments in a biologically inspired manner. Nevertheless, the existing approaches trained by monocular sequences are still prone to fail in dynamic environments. In this work, we mitigate the negative influence of dynamic environments on the joint estimation of depth and visual odometry (VO) through hybrid masks. Since both the VO estimation and view reconstruction process in the joint estimation framework is vulnerable to dynamic environments, we propose the cover mask and the filter mask to alleviate the adverse effects, respectively. As the depth and VO estimation are tightly coupled during training, the improved VO estimation promotes depth estimation as well. Besides, a depth-pose consistency loss is proposed to overcome the scale inconsistency between different training samples of monocular sequences. Experimental results show that both our depth prediction and globally consistent VO estimation are state of the art when evaluated on the KITTI benchmark. We evaluate our depth prediction model on the Make3D dataset to prove the transferability of our method as well.

3.
IEEE Trans Neural Netw Learn Syst ; 32(12): 5404-5415, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33979291

RESUMO

Semantic segmentation and depth completion are two challenging tasks in scene understanding, and they are widely used in robotics and autonomous driving. Although several studies have been proposed to jointly train these two tasks using some small modifications, such as changing the last layer, the result of one task is not utilized to improve the performance of the other one despite that there are some similarities between these two tasks. In this article, we propose multitask generative adversarial networks (Multitask GANs), which are not only competent in semantic segmentation and depth completion but also improve the accuracy of depth completion through generated semantic images. In addition, we improve the details of generated semantic images based on CycleGAN by introducing multiscale spatial pooling blocks and the structural similarity reconstruction loss. Furthermore, considering the inner consistency between semantic and geometric structures, we develop a semantic-guided smoothness loss to improve depth completion results. Extensive experiments on the Cityscapes data set and the KITTI depth completion benchmark show that the Multitask GANs are capable of achieving competitive performance for both semantic segmentation and depth completion tasks.

4.
IEEE Trans Neural Netw Learn Syst ; 32(12): 5392-5403, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33361009

RESUMO

Previous work has shown that adversarial learning can be used for unsupervised monocular depth and visual odometry (VO) estimation, in which the adversarial loss and the geometric image reconstruction loss are utilized as the mainly supervisory signals to train the whole unsupervised framework. However, the performance of the adversarial framework and image reconstruction is usually limited by occlusions and the visual field changes between the frames. This article proposes a masked generative adversarial network (GAN) for unsupervised monocular depth and ego-motion estimations. The MaskNet and Boolean mask scheme are designed in this framework to eliminate the effects of occlusions and impacts of visual field changes on the reconstruction loss and adversarial loss, respectively. Furthermore, we also consider the scale consistency of our pose network by utilizing a new scale-consistency loss, and therefore, our pose network is capable of providing the full camera trajectory over a long monocular sequence. Extensive experiments on the KITTI data set show that each component proposed in this article contributes to the performance, and both our depth and trajectory predictions achieve competitive performance on the KITTI and Make3D data sets.

5.
Patterns (N Y) ; 1(4): 100050, 2020 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-33205114

RESUMO

With widespread applications of artificial intelligence (AI), the capabilities of the perception, understanding, decision-making, and control for autonomous systems have improved significantly in recent years. When autonomous systems consider the performance of accuracy and transferability, several AI methods, such as adversarial learning, reinforcement learning (RL), and meta-learning, show their powerful performance. Here, we review the learning-based approaches in autonomous systems from the perspectives of accuracy and transferability. Accuracy means that a well-trained model shows good results during the testing phase, in which the testing set shares a same task or a data distribution with the training set. Transferability means that when a well-trained model is transferred to other testing domains, the accuracy is still good. Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL, and meta-learning. Secondly, we focus on reviewing the accuracy or transferability or both of these approaches to show the advantages of adversarial learning, such as generative adversarial networks, in typical computer vision tasks in autonomous systems, including image style transfer, image super-resolution, image deblurring/dehazing/rain removal, semantic segmentation, depth estimation, pedestrian detection, and person re-identification. We furthermore review the performance of RL and meta-learning from the aspects of accuracy or transferability or both of them in autonomous systems, involving pedestrian tracking, robot navigation, and robotic manipulation. Finally, we discuss several challenges and future topics for the use of adversarial learning, RL, and meta-learning in autonomous systems.

6.
IEEE Trans Image Process ; 11(11): 1249-59, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-18249695

RESUMO

We describe a novel approach for creating a three-dimensional (3-D) face structure from multiple image views of a human face taken at a priori unknown poses by appropriately morphing a generic 3-D face. A cubic explicit polynomial in 3-D is used to morph a generic face into the specific face structure. The 3-D face structure allows for accurate pose estimation as well as the synthesis of virtual images to be matched with a test image for face identification. The estimation of a 3-D person's face and pose estimation is achieved through the use of a distance map metric. This distance map residual error (geometric-based face classifier) and the image intensity residual error are fused in identifying a person in the database from one or more arbitrary image view(s). Experimental results are shown on simulated data in the presence of noise, as well as for images of real faces, and promising results are obtained.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...