Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Cybern ; 54(5): 2784-2797, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37713227

RESUMO

Robotic rigid contact-rich manipulation in an unstructured dynamic environment requires an effective resolution for smart manufacturing. As the most common use case for the intelligence industry, a lot of studies based on reinforcement learning (RL) algorithms have been conducted to improve the performances of single peg-in-hole assembly. However, existing RL methods are difficult to apply to multiple peg-in-hole issues due to more complicated geometric and physical constraints. In addition, previously limited solutions for multiple peg-in-hole assembly are hard to transfer into real industrial scenarios flexibly. To effectively address these issues, this work designs a novel and more challenging multiple peg-in-hole assembly setup by using the advantage of the Industrial Metaverse. We propose a detailed solution scheme to solve this task. Specifically, multiple modalities, including vision, proprioception, and force/torque, are learned as compact representations to account for the complexity and uncertainties and improve the sample efficiency. Furthermore, RL is used in the simulation to train the policy, and the learned policy is transferred to the real world without extra exploration. Domain randomization and impedance control are embedded into the policy to narrow the gap between simulation and reality. Evaluation results demonstrate the effectiveness of the proposed solution, showcasing successful multiple peg-in-hole assembly and generalization across different object shapes in real-world scenarios.

2.
IEEE Trans Cybern ; PP2022 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-36179009

RESUMO

Markerless vision-based teleoperation that leverages innovations in computer vision offers the advantages of allowing natural and noninvasive finger motions for multifingered robot hands. However, current pose estimation methods still face inaccuracy issues due to the self-occlusion of the fingers. Herein, we develop a novel vision-based hand-arm teleoperation system that captures the human hands from the best viewpoint and at a suitable distance. This teleoperation system consists of an end-to-end hand pose regression network and a controlled active vision system. The end-to-end pose regression network (Transteleop), combined with an auxiliary reconstruction loss function, captures the human hand through a low-cost depth camera and predicts joint commands of the robot based on the image-to-image translation method. To obtain the optimal observation of the human hand, an active vision system is implemented by a robot arm at the local site that ensures the high accuracy of the proposed neural network. Human arm motions are simultaneously mapped to the slave robot arm under relative control. Quantitative network evaluation and a variety of complex manipulation tasks, for example, tower building, pouring, and multitable cup stacking, demonstrate the practicality and stability of the proposed teleoperation system.

3.
Front Neurorobot ; 16: 829437, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35308311

RESUMO

We propose a vision-proprioception model for planar object pushing, efficiently integrating all necessary information from the environment. A Variational Autoencoder (VAE) is used to extract compact representations from the task-relevant part of the image. With the real-time robot state obtained easily from the hardware system, we fuse the latent representations from the VAE and the robot end-effector position together as the state of a Markov Decision Process. We use Soft Actor-Critic to train the robot to push different objects from random initial poses to target positions in simulation. Hindsight Experience replay is applied during the training process to improve the sample efficiency. Experiments demonstrate that our algorithm achieves a pushing performance superior to a state-based baseline model that cannot be generalized to a different object and outperforms state-of-the-art policies which operate on raw image observations. At last, we verify that our trained model has a good generalization ability to unseen objects in the real world.

4.
Front Neurorobot ; 14: 26, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477091

RESUMO

Similar to specific natural language instructions, intention-related natural language queries also play an essential role in our daily life communication. Inspired by the psychology term "affordance" and its applications in Human-Robot interaction, we propose an object affordance-based natural language visual grounding architecture to ground intention-related natural language queries. Formally, we first present an attention-based multi-visual features fusion network to detect object affordances from RGB images. While fusing deep visual features extracted from a pre-trained CNN model with deep texture features encoded by a deep texture encoding network, the presented object affordance detection network takes into account the interaction of the multi-visual features, and reserves the complementary nature of the different features by integrating attention weights learned from sparse representations of the multi-visual features. We train and validate the attention-based object affordance recognition network on a self-built dataset in which a large number of images originate from MSCOCO and ImageNet. Moreover, we introduce an intention semantic extraction module to extract intention semantics from intention-related natural language queries. Finally, we ground intention-related natural language queries by integrating the detected object affordances with the extracted intention semantics. We conduct extensive experiments to validate the performance of the object affordance detection network and the intention-related natural language queries grounding architecture.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...