Pesquisa | Portal Regional da BVS

1.

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning.

Song, Yunlong; Romero, Angel; Müller, Matthias; Koltun, Vladlen; Scaramuzza, Davide.

Sci Robot ; 8(82): eadg1462, 2023 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-37703383

RESUMO

A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.

2.

Champion-level drone racing using deep reinforcement learning.

Kaufmann, Elia; Bauersfeld, Leonard; Loquercio, Antonio; Müller, Matthias; Koltun, Vladlen; Scaramuzza, Davide.

Nature ; 620(7976): 982-987, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37648758

RESUMO

First-person view (FPV) drone racing is a televised sport in which professional competitors pilot high-speed aircraft through a 3D circuit. Each pilot sees the environment from the perspective of their drone by means of video streamed from an onboard camera. Reaching the level of professional pilots with an autonomous drone is challenging because the robot needs to fly at its physical limits while estimating its speed and location in the circuit exclusively from onboard sensors1. Here we introduce Swift, an autonomous system that can race physical vehicles at the level of the human world champions. The system combines deep reinforcement learning (RL) in simulation with data collected in the physical world. Swift competed against three human champions, including the world champions of two international leagues, in real-world head-to-head races. Swift won several races against each of the human champions and demonstrated the fastest recorded race time. This work represents a milestone for mobile robotics and machine intelligence2, which may inspire the deployment of hybrid learning-based solutions in other physical systems.

3.

Enhancing Photorealism Enhancement.

Richter, Stephan R; Alhaija, Hassan Abu; Koltun, Vladlen.

IEEE Trans Pattern Anal Mach Intell ; 45(2): 1700-1715, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-35412970

RESUMO

We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.

4.

MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation.

Lambert, John; Liu, Zhuang; Sener, Ozan; Hays, James; Koltun, Vladlen.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 796-810, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-35157579

RESUMO

We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. A naive merge of the constituent datasets yields poor performance due to inconsistent taxonomies and annotation practices. We reconcile the taxonomies and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images, requiring more than 1.34 years of collective annotator effort. The resulting composite dataset enables training a single semantic segmentation model that functions effectively across domains and generalizes to datasets that were not seen during training. We adopt zero-shot cross-dataset transfer as a benchmark to systematically evaluate a model's robustness and show that MSeg training yields substantially more robust models in comparison to training on individual datasets or naive mixing of datasets without the presented contributions. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training. We evaluate our models in the 2020 Robust Vision Challenge (RVC) as an extreme generalization experiment. MSeg training sets include only three of the seven datasets in the RVC; more importantly, the evaluation taxonomy of RVC is different and more detailed. Surprisingly, our model shows competitive performance and ranks second. To evaluate how close we are to the grand aim of robust, efficient, and complete scene understanding, we go beyond semantic segmentation by training instance segmentation and panoptic segmentation models using our dataset. Moreover, we also evaluate various engineering design decisions and metrics, including resolution and computational efficiency. Although our models are far from this grand aim, our comprehensive evaluation is crucial for progress. We share all the models and code with the community.

5.

ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception.

Dong, Wei; Lao, Yixing; Kaess, Michael; Koltun, Vladlen.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5417-5435, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36227823

RESUMO

We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike existing GPU hash maps, the ASH framework provides a versatile tensor interface, hiding low-level details from the users. In addition, by decoupling the internal hashing data structures and key-value data in buffers, we offer direct access to spatially varying data via indices, enabling seamless integration to modern libraries such as PyTorch. To achieve this, we 1) detach stored key-value data from the low-level hash map implementation; 2) bridge the pointer-first low level data structures to index-first high-level tensor interfaces via an index heap; 3) adapt both generic and non-generic integer-only hash map implementations as backends to operate on multi-dimensional keys. We first profile our hash map against state-of-the-art hash maps on synthetic data to show the performance gain from this architecture. We then show that ASH can consistently achieve higher performance on various large-scale 3D perception tasks with fewer LoC by showcasing several applications, including 1) point cloud voxelization, 2) retargetable volumetric scene reconstruction, 3) non-rigid point cloud registration and volumetric deformation, and 4) spatially varying geometry and appearance refinement. ASH and its example applications are open sourced in Open3D (http://www.open3d.org).

6.

Drinking From a Firehose: Continual Learning With Web-Scale Natural Language.

Hu, Hexiang; Sener, Ozan; Sha, Fei; Koltun, Vladlen.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5684-5696, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36315549

RESUMO

Continual learning systems will interact with humans, with each other, and with the physical world through time - and continue to learn and adapt as they do. An important open problem for continual learning is a large-scale benchmark which enables realistic evaluation of algorithms. In this paper, we study a natural setting for continual learning on a massive scale. We introduce the problem of personalized online language learning (POLL), which involves fitting personalized language models to a population of users that evolves over time. To facilitate research on POLL, we collect massive datasets of Twitter posts. These datasets, Firehose10 M and Firehose100 M, comprise 100 million tweets, posted by one million users over six years. Enabled by the Firehose datasets, we present a rigorous evaluation of continual learning algorithms on an unprecedented scale. Based on this analysis, we develop a simple algorithm for continual gradient descent (ConGraD) that outperforms prior continual learning methods on the Firehose datasets as well as earlier benchmarks. Collectively, the POLL problem setting, the Firehose datasets, and the ConGraD algorithm enable a complete benchmark for reproducible research on web-scale continual learning.

7.

Learning robust perceptive locomotion for quadrupedal robots in the wild.

Miki, Takahiro; Lee, Joonho; Hwangbo, Jemin; Wellhausen, Lorenz; Koltun, Vladlen; Hutter, Marco.

Sci Robot ; 7(62): eabk2822, 2022 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-35044798

RESUMO

Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into underexplored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: Perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, using exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step or are missing altogether due to high reflectance. In addition, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed because the robot has to physically feel out the terrain before adapting its gait accordingly. Here, we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.

Assuntos

Locomoção/fisiologia , Robótica/instrumentação , Materiais Biomiméticos , Biomimética , Simulação por Computador , Meio Ambiente , Marcha/fisiologia , Humanos , Aprendizado de Máquina , Modelos Biológicos , Redes Neurais de Computação , Propriocepção/fisiologia , Robótica/estatística & dados numéricos , Estações do Ano , Caminhada/fisiologia

8.

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.

Ranftl, Rene; Lasinger, Katrin; Hafner, David; Schindler, Konrad; Koltun, Vladlen.

IEEE Trans Pattern Anal Mach Intell ; 44(3): 1623-1637, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-32853149

RESUMO

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation.

Assuntos

Algoritmos

9.

Learning high-speed flight in the wild.

Loquercio, Antonio; Kaufmann, Elia; Ranftl, René; Müller, Matthias; Koltun, Vladlen; Scaramuzza, Davide.

Sci Robot ; 6(59): eabg5810, 2021 Oct 06.

Artigo em Inglês | MEDLINE | ID: mdl-34613820

RESUMO

Quadrotors are agile. Unlike most other machines, they can traverse extremely complex environments at high speeds. To date, only expert human pilots have been able to fully exploit their capabilities. Autonomous operation with onboard sensing and computation has been limited to low speeds. State-of-the-art methods generally separate the navigation problem into subtasks: sensing, mapping, and planning. Although this approach has proven successful at low speeds, the separation it builds upon can be problematic for high-speed navigation in cluttered environments. The subtasks are executed sequentially, leading to increased processing latency and a compounding of errors through the pipeline. Here, we propose an end-to-end approach that can autonomously fly quadrotors through complex natural and human-made environments at high speeds with purely onboard sensing and computation. The key principle is to directly map noisy sensory observations to collision-free trajectories in a receding-horizon fashion. This direct mapping drastically reduces processing latency and increases robustness to noisy and incomplete perception. The sensorimotor mapping is performed by a convolutional network that is trained exclusively in simulation via privileged learning: imitating an expert with access to privileged information. By simulating realistic sensor noise, our approach achieves zero-shot transfer from simulation to challenging real-world environments that were never experienced during training: dense forests, snow-covered terrain, derailed trains, and collapsed buildings. Our work demonstrates that end-to-end policies trained in simulation enable high-speed autonomous flight through challenging environments, outperforming traditional obstacle avoidance pipelines.

10.

The h-index is no longer an effective correlate of scientific reputation.

Koltun, Vladlen; Hafner, David.

PLoS One ; 16(6): e0253397, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34181681

RESUMO

The impact of individual scientists is commonly quantified using citation-based measures. The most common such measure is the h-index. A scientist's h-index affects hiring, promotion, and funding decisions, and thus shapes the progress of science. Here we report a large-scale study of scientometric measures, analyzing millions of articles and hundreds of millions of citations across four scientific fields and two data platforms. We find that the correlation of the h-index with awards that indicate recognition by the scientific community has substantially declined. These trends are associated with changing authorship patterns. We show that these declines can be mitigated by fractional allocation of citations among authors, which has been discussed in the literature but not implemented at scale. We find that a fractional analogue of the h-index outperforms other measures as a correlate and predictor of scientific awards. Our results suggest that the use of the h-index in ranking scientists should be reconsidered, and that fractional allocation measures such as h-frac provide more robust alternatives.

Assuntos

Autoria , Distinções e Prêmios , Pesquisadores , Ciência , Humanos

11.

High Speed and High Dynamic Range Video with an Event Camera.

Rebecq, Henri; Ranftl, Rene; Koltun, Vladlen; Scaramuzza, Davide.

IEEE Trans Pattern Anal Mach Intell ; 43(6): 1964-1980, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31902754

RESUMO

Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.

12.

Learning quadrupedal locomotion over challenging terrain.

Lee, Joonho; Hwangbo, Jemin; Wellhausen, Lorenz; Koltun, Vladlen; Hutter, Marco.

Sci Robot ; 5(47)2020 10 21.

Artigo em Inglês | MEDLINE | ID: mdl-33087482

RESUMO

Legged locomotion can extend the operational domain of robots to some of the most challenging environments on Earth. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have increased in complexity but fallen short of the generality and robustness of animal locomotion. Here, we present a robust controller for blind quadrupedal locomotion in challenging natural environments. Our approach incorporates proprioceptive feedback in locomotion control and demonstrates zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. The controller is driven by a neural network policy that acts on a stream of proprioceptive signals. The controller retains its robustness under conditions that were never encountered during training: deformable terrains such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work indicates that robust locomotion in natural environments can be achieved by training in simple domains.

13.

Learning agile and dynamic motor skills for legged robots.

Hwangbo, Jemin; Lee, Joonho; Dosovitskiy, Alexey; Bellicoso, Dario; Tsounis, Vassilios; Koltun, Vladlen; Hutter, Marco.

Sci Robot ; 4(26)2019 01 16.

Artigo em Inglês | MEDLINE | ID: mdl-33137755

RESUMO

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

14.

Does computer vision matter for action?

Zhou, Brady; Krähenbühl, Philipp; Koltun, Vladlen.

Sci Robot ; 4(30)2019 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-33137779

RESUMO

Controlled experiments indicate that explicit intermediate representations help action.

15.

Direct Sparse Odometry.

Engel, Jakob; Koltun, Vladlen; Cremers, Daniel.

IEEE Trans Pattern Anal Mach Intell ; 40(3): 611-625, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-28422651

RESUMO

Direct Sparse Odometry (DSO) is a visual odometry method based on a novel, highly accurate sparse and direct structure and motion formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry-represented as inverse depth in a reference frame-and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on essentially featureless walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

16.

Robust continuous clustering.

Shah, Sohil Atul; Koltun, Vladlen.

Proc Natl Acad Sci U S A ; 114(37): 9814-9819, 2017 09 12.

Artigo em Inglês | MEDLINE | ID: mdl-28851838

RESUMO

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank.

17.

Optimizing Locomotion Controllers Using Biologically-Based Actuators and Objectives.

Wang, Jack M; Hamner, Samuel R; Delp, Scott L; Koltun, Vladlen.

ACM Trans Graph ; 31(4)2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-26251560

RESUMO

We present a technique for automatically synthesizing walking and running controllers for physically-simulated 3D humanoid characters. The sagittal hip, knee, and ankle degrees-of-freedom are actuated using a set of eight Hill-type musculotendon models in each leg, with biologically-motivated control laws. The parameters of these control laws are set by an optimization procedure that satisfies a number of locomotion task terms while minimizing a biological model of metabolic energy expenditure. We show that the use of biologically-based actuators and objectives measurably increases the realism of gaits generated by locomotion controllers that operate without the use of motion capture data, and that metabolic energy expenditure provides a simple and unifying measurement of effort that can be used for both walking and running control optimization.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA