Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-37738191

ABSTRACT

Deep-learning-based localization and mapping approaches have recently emerged as a new research direction and receive significant attention from both industry and academia. Instead of creating hand-designed algorithms based on physical models or geometric theories, deep learning solutions provide an alternative to solve the problem in a data-driven way. Benefiting from the ever-increasing volumes of data and computational power on devices, these learning methods are fast evolving into a new area that shows potential to track self-motion and estimate environmental models accurately and robustly for mobile agents. In this work, we provide a comprehensive survey and propose a taxonomy for the localization and mapping methods using deep learning. This survey aims to discuss two basic questions: whether deep learning is promising for localization and mapping, and how deep learning should be applied to solve this problem. To this end, a series of localization and mapping topics are investigated, from the learning-based visual odometry and global relocalization to mapping, and simultaneous localization and mapping (SLAM). It is our hope that this survey organically weaves together the recent works in this vein from robotics, computer vision, and machine learning communities and serves as a guideline for future researchers to apply deep learning to tackle the problem of visual localization and mapping.

2.
Article in English | MEDLINE | ID: mdl-35657847

ABSTRACT

Autonomous vehicles and mobile robotic systems are typically equipped with multiple sensors to provide redundancy. By integrating the observations from different sensors, these mobile agents are able to perceive the environment and estimate system states, e.g., locations and orientations. Although deep learning (DL) approaches for multimodal odometry estimation and localization have gained traction, they rarely focus on the issue of robust sensor fusion--a necessary consideration to deal with noisy or incomplete sensor observations in the real world. Moreover, current deep odometry models suffer from a lack of interpretability. To this extent, we propose SelectFusion, an end-to-end selective sensor fusion module that can be applied to useful pairs of sensor modalities, such as monocular images and inertial measurements, depth images, and light detection and ranging (LIDAR) point clouds. Our model is a uniform framework that is not restricted to specific modality or task. During prediction, the network is able to assess the reliability of the latent features from different sensor modalities and to estimate trajectory at both scale and global pose. In particular, we propose two fusion modules--a deterministic soft fusion and a stochastic hard fusion--and offer a comprehensive study of the new strategies compared with trivial direct fusion. We extensively evaluate all fusion strategies both on public datasets and on progressively degraded datasets that present synthetic occlusions, noisy and missing data, and time misalignment between sensors, and we investigate the effectiveness of the different fusion strategies in attending the most reliable features, which in itself provides insights into the operation of the various models.

3.
Article in English | MEDLINE | ID: mdl-35622807

ABSTRACT

Deep convolutional neural networks have been leveraged to achieve huge improvements in video understanding and human activity recognition performance in the past decade. However, most existing methods focus on activities that have similar time scales, leaving the task of action recognition on multiscale human behaviors less explored. In this study, a two-stream multiscale human activity recognition and anticipation (MS-HARA) network is proposed, which is jointly optimized using a multitask learning method. The MS-HARA network fuses the two streams of the network using an efficient temporal-channel attention (TCA)-based fusion approach to improve the model's representational ability for both temporal and spatial features. We investigate the multiscale human activities from two basic categories, namely, midterm activities and long-term activities. The network is designed to function as part of a real-time processing framework to support interaction and mutual understanding between humans and intelligent machines. It achieves state-of-the-art results on several datasets for different tasks and different application domains. The midterm and long-term action recognition and anticipation performance, as well as the network fusion, are extensively tested to show the efficiency of the proposed network. The results show that the MS-HARA network can easily be extended to different application domains.

4.
Neural Netw ; 150: 119-136, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35313245

ABSTRACT

In the last decade, numerous supervised deep learning approaches have been proposed for visual-inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.


Subject(s)
Motion , Calibration , Vision, Monocular
5.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8338-8354, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34033533

ABSTRACT

We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Comparative experiments show that our RandLA-Net can process 1 million points in a single pass up to 200× faster than existing approaches. Moreover, extensive experiments on five large-scale point cloud datasets, including Semantic3D, SemanticKITTI, Toronto3D, NPM3D and S3DIS, demonstrate the state-of-the-art semantic segmentation performance of our RandLA-Net.

6.
Nat Mach Intell ; 4(9): 749-760, 2022.
Article in English | MEDLINE | ID: mdl-37790900

ABSTRACT

Interest in autonomous vehicles (AVs) is growing at a rapid pace due to increased convenience, safety benefits and potential environmental gains. Although several leading AV companies predicted that AVs would be on the road by 2020, they are still limited to relatively small-scale trials. The ability to know their precise location on the map is a challenging prerequisite for safe and reliable AVs due to sensor imperfections under adverse environmental and weather conditions, posing a formidable obstacle to their widespread use. Here we propose a deep learning-based self-supervised approach for ego-motion estimation that is a robust and complementary localization solution under inclement weather conditions. The proposed approach is a geometry-aware method that attentively fuses the rich representation capability of visual sensors and the weather-immune features provided by radars using an attention-based learning technique. Our method predicts reliability masks for the sensor measurements, eliminating the deficiencies in the multimodal data. In various experiments we demonstrate the robust all-weather performance and effective cross-domain generalizability under harsh weather conditions such as rain, fog and snow, as well as day and night conditions. Furthermore, we employ a game-theoretic approach to analyse the interpretability of the model predictions, illustrating the independent and uncorrelated failure modes of the multimodal system. We anticipate our work will bring AVs one step closer to safe and reliable all-weather autonomous driving.

7.
IEEE Trans Neural Netw Learn Syst ; 32(12): 5479-5491, 2021 12.
Article in English | MEDLINE | ID: mdl-34559667

ABSTRACT

Dynamical models estimate and predict the temporal evolution of physical systems. State-space models (SSMs) in particular represent the system dynamics with many desirable properties, such as being able to model uncertainty in both the model and measurements, and optimal (in the Bayesian sense) recursive formulations, e.g., the Kalman filter. However, they require significant domain knowledge to derive the parametric form and considerable hand tuning to correctly set all the parameters. Data-driven techniques, e.g., recurrent neural networks, have emerged as compelling alternatives to SSMs with wide success across a number of challenging tasks, in part due to their impressive capability to extract relevant features from rich inputs. They, however, lack interpretability and robustness to unseen conditions. Thus, data-driven models are hard to be applied in safety-critical applications, such as self-driving vehicles. In this work, we present DynaNet, a hybrid deep learning and time-varying SSM, which can be trained end-to-end. Our neural Kalman dynamical model allows us to exploit the relative merits of both SSM and deep neural networks. We demonstrate its effectiveness in the estimation and prediction on a number of physically challenging tasks, including visual odometry, sensor fusion for visual-inertial navigation, and motion prediction. In addition, we show how DynaNet can indicate failures through investigation of properties, such as the rate of innovation (Kalman gain).

8.
IEEE Trans Neural Netw Learn Syst ; 32(1): 166-176, 2021 Jan.
Article in English | MEDLINE | ID: mdl-32203029

ABSTRACT

Due to the sparse rewards and high degree of environmental variation, reinforcement learning approaches, such as deep deterministic policy gradient (DDPG), are plagued by issues of high variance when applied in complex real-world environments. We present a new framework for overcoming these issues by incorporating a stochastic switch, allowing an agent to choose between high- and low-variance policies. The stochastic switch can be jointly trained with the original DDPG in the same framework. In this article, we demonstrate the power of the framework in a navigation task, where the robot can dynamically choose to learn through exploration or to use the output of a heuristic controller as guidance. Instead of starting from completely random actions, the navigation capability of a robot can be quickly bootstrapped by several simple independent controllers. The experimental results show that with the aid of stochastic guidance, we are able to effectively and efficiently train DDPG navigation policies and achieve significantly better performance than state-of-the-art baseline models.

9.
IEEE Trans Pattern Anal Mach Intell ; 41(12): 2820-2834, 2019 Dec.
Article in English | MEDLINE | ID: mdl-30183619

ABSTRACT

In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 2563 by recovering the occluded/missing regions. The key idea is to combine the generative capabilities of 3D encoder-decoder and the conditional adversarial networks framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets and real-world Kinect datasets show that the proposed 3D-RecGAN++ significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects.

10.
PLoS One ; 9(1): e83156, 2014.
Article in English | MEDLINE | ID: mdl-24465376

ABSTRACT

We establish intra-individual and inter-annual variability in European badger (Meles meles) autumnal nightly activity in relation to fine-scale climatic variables, using tri-axial accelerometry. This contributes further to understanding of causality in the established interaction between weather conditions and population dynamics in this species. Modelling found that measures of daylight, rain/humidity, and soil temperature were the most supported predictors of ACTIVITY, in both years studied. In 2010, the drier year, the most supported model included the SOLAR*RH interaction, RAIN, and 30cmTEMP (w = 0.557), while in 2012, a wetter year, the most supported model included the SOLAR*RH interaction, and the RAIN*10cmTEMP (w = 0.999). ACTIVITY also differed significantly between individuals. In the 2012 autumn study period, badgers with the longest per noctem activity subsequently exhibited higher Body Condition Indices (BCI) when recaptured. In contrast, under drier 2010 conditions, badgers in good BCI engaged in less per noctem activity, while badgers with poor BCI were the most active. When compared on the same calendar dates, to control for night length, duration of mean badger nightly activity was longer (9.5 hrs ±3.3 SE) in 2010 than in 2012 (8.3 hrs ±1.9 SE). In the wetter year, increasing nightly activity was associated with net-positive energetic gains (from BCI), likely due to better foraging conditions. In a drier year, with greater potential for net-negative energy returns, individual nutritional state proved crucial in modifying activity regimes; thus we emphasise how a 'one size fits all' approach should not be applied to ecological responses.


Subject(s)
Mustelidae/physiology , Seasons , Animal Nutritional Physiological Phenomena , Animals , Europe , Humidity , Temperature
11.
Philos Trans A Math Phys Eng Sci ; 370(1958): 5-10, 2012 Jan 13.
Article in English | MEDLINE | ID: mdl-22124078

ABSTRACT

A sensor network is a collection of nodes with processing, communication and sensing capabilities deployed in an area of interest to perform a monitoring task. There has now been about a decade of very active research in the area of sensor networks, with significant accomplishments made in terms of both designing novel algorithms and building exciting new sensing applications. This Theme Issue provides a broad sampling of the central challenges and the contributions that have been made towards addressing these challenges in the field, and illustrates the pervasive and central role of sensor networks in monitoring human activities and the environment.

SELECTION OF CITATIONS
SEARCH DETAIL
...