Search | VHL Regional Portal

Crystallization-Inspired Design and Modeling of Self-Assembly Lattice-Formation Swarm Robotics.

Pan, Zebang; Wen, Guilin; Yin, Hanfeng; Yin, Shan; Tan, Zhao.

Sensors (Basel) ; 24(10)2024 May 12.

Article in English | MEDLINE | ID: mdl-38793934

ABSTRACT

Self-assembly formation is a key research topic for realizing practical applications in swarm robotics. Due to its inherent complexity, designing high-performance self-assembly formation strategies and proposing corresponding macroscopic models remain formidable challenges and present an open research frontier. Taking inspiration from crystallization, this paper introduces a distributed self-assembly formation strategy by defining free, moving, growing, and solid states for robots. Robots in these states can spontaneously organize into user-specified two-dimensional shape formations with lattice structures through local interactions and communications. To address the challenges posed by complex spatial structures in modeling a macroscopic model, this work introduces the structural features estimation method. Subsequently, a corresponding non-spatial macroscopic model is developed to predict and analyze the self-assembly behavior, employing the proposed estimation method and a stock and flow diagram. Real-robot experiments and simulations validate the flexibility, scalability, and high efficiency of the proposed self-assembly formation strategy. Moreover, extensive experimental and simulation results demonstrate the model's accuracy in predicting the self-assembly process under different conditions. Model-based analysis indicates that the proposed self-assembly formation strategy can fully utilize the performance of individual robots and exhibits strong self-stability.

An immediate-return reinforcement learning for the atypical Markov decision processes.

Pan, Zebang; Wen, Guilin; Tan, Zhao; Yin, Shan; Hu, Xiaoyan.

Front Neurorobot ; 16: 1012427, 2022.

Article in English | MEDLINE | ID: mdl-36582302

ABSTRACT

The atypical Markov decision processes (MDPs) are decision-making for maximizing the immediate returns in only one state transition. Many complex dynamic problems can be regarded as the atypical MDPs, e.g., football trajectory control, approximations of the compound Poincaré maps, and parameter identification. However, existing deep reinforcement learning (RL) algorithms are designed to maximize long-term returns, causing a waste of computing resources when applied in the atypical MDPs. These existing algorithms are also limited by the estimation error of the value function, leading to a poor policy. To solve such limitations, this paper proposes an immediate-return algorithm for the atypical MDPs with continuous action space by designing an unbiased and low variance target Q-value and a simplified network framework. Then, two examples of atypical MDPs considering the uncertainty are presented to illustrate the performance of the proposed algorithm, i.e., passing the football to a moving player and chipping the football over the human wall. Compared with the existing deep RL algorithms, such as deep deterministic policy gradient and proximal policy optimization, the proposed algorithm shows significant advantages in learning efficiency, the effective rate of control, and computing resource usage.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL