Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Neural Netw ; 170: 610-621, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38056408

RESUMO

Multi-agent reinforcement learning (MARL) algorithms based on trust regions (TR) have achieved significant success in numerous cooperative multi-agent tasks. These algorithms restrain the Kullback-Leibler (KL) divergence (i.e., TR constraint) between the current and new policies to avoid aggressive update steps and improve learning performance. However, the majority of existing TR-based MARL algorithms are on-policy, meaning that they require new data sampled by current policies for training and cannot utilize off-policy (or historical) data, leading to low sample efficiency. This study aims to enhance the data efficiency of TR-based learning methods. To achieve this, an approximation of the original objective function is designed. In addition, it is proven that as long as the update size of the policy (measured by the KL divergence) is restricted, optimizing the designed objective function using historical data can guarantee the monotonic improvement of the original target. Building on the designed objective, a practical off-policy multi-agent stochastic policy gradient algorithm is proposed within the framework of centralized training with decentralized execution (CTDE). Additionally, policy entropy is integrated into the reward to promote exploration, and consequently, improve stability. Comprehensive experiments are conducted on a representative benchmark for multi-agent MuJoCo (MAMuJoCo), which offers a range of challenging tasks in cooperative continuous multi-agent control. The results demonstrate that the proposed algorithm outperforms all other existing algorithms by a significant margin.


Assuntos
Algoritmos , Aprendizagem , Benchmarking , Entropia , Políticas
2.
Sensors (Basel) ; 22(17)2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36081107

RESUMO

The recognition of warheads in the target cloud of the ballistic midcourse phase remains a challenging issue for missile defense systems. Considering factors such as the differing dimensions of the features between sensors and the different recognition credibility of each sensor, this paper proposes a weighted decision-level fusion architecture to take advantage of data from multiple radar sensors, and an online feature reliability evaluation method is also used to comprehensively generate sensor weight coefficients. The weighted decision-level fusion method can overcome the deficiency of a single sensor and enhance the recognition rate for warheads in the midcourse phase by considering the changes in the reliability of the sensor's performance caused by the influence of the environment, location, and other factors during observation. Based on the simulation dataset, the experiment was carried out with multiple sensors and multiple bandwidths, and the results showed that the proposed model could work well with various classifiers involving traditional learning algorithms and ensemble learning algorithms.


Assuntos
Algoritmos , Simulação por Computador , Reprodutibilidade dos Testes
3.
Sensors (Basel) ; 18(10)2018 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-30360418

RESUMO

In the last decade, fingerprinting localization using wireless local area network (WLAN) has been paid lots of attention. However, this method needs to establish a database called radio map in the off-line stage, which is a labor-intensive and time-consuming process. To save the radio map establishment cost and improve localization performance, in this paper, we first propose a Voronoi diagram and crowdsourcing-based radio map interpolation method. The interpolation method optimizes propagation model parameters for each Voronoi cell using the received signal strength (RSS) and location coordinates of crowdsourcing points and estimates the RSS samples of interpolation points with the optimized propagation model parameters to establish a new radio map. Then a general regression neural network (GRNN) is employed to fuse the new and original radio maps established through interpolation and manual operation, respectively, and also used as a fingerprinting localization algorithm to compute localization coordinates. The experimental results demonstrate that our proposed GRNN fingerprinting localization system with the fused radio map is able to considerably improve the localization performance.

4.
Sensors (Basel) ; 18(2)2018 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-29401680

RESUMO

Gaofen-3 (GF-3) is China' first C-band multi-polarization synthetic aperture radar (SAR) satellite, which also provides the sliding spotlight mode for the first time. Sliding-spotlight mode is a novel mode to realize imaging with not only high resolution, but also wide swath. Several key technologies for sliding spotlight mode in spaceborne SAR with high resolution are investigated in this paper, mainly including the imaging parameters, the methods of velocity estimation and ambiguity elimination, and the imaging algorithms. Based on the chosen Convolution BackProjection (CBP) and PFA (Polar Format Algorithm) imaging algorithms, a fast implementation method of CBP and a modified PFA method suitable for sliding spotlight mode are proposed, and the processing flows are derived in detail. Finally, the algorithms are validated by simulations and measured data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...