Search | VHL Regional Portal

Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning.

Wang, Xiaoqiang; Ke, Liangjun; Qiao, Zhimin; Chai, Xinghua.

IEEE Trans Cybern ; 51(1): 174-187, 2021 Jan.

Article in English | MEDLINE | ID: mdl-32881705

ABSTRACT

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.

Optimal Energy-Delay in Energy Harvesting Wireless Sensor Networks with Interference Channels.

Jiao, Dongbin; Ke, Liangjun; Liu, Shengbo; Chan, Felix T S.

Sensors (Basel) ; 19(4)2019 Feb 14.

Article in English | MEDLINE | ID: mdl-30769900

ABSTRACT

In this work, we investigate the capacity allocation problem in the energy harvesting wireless sensor networks (WSNs) with interference channels. For the fixed topologies of data and energy, we formulate the optimization problem when the data flow remains constant on all data links and each sensor node harvests energy only once in a time slot. We focus on the optimal data rates, power allocations and energy transfers between sensor nodes in a time slot. Our goal is to minimize the total delay in the network under two scenarios, i.e., no energy transfer and energy transfer. Furthermore, since the optimization problem is non-convex and difficult to solve directly, by considering the network with the relatively high signal-to-interference-plus-noise ratio (SINR), the non-convex optimization problem can be transformed into a convex optimization problem by convex approximation. We attain the properties of the optimal solution by Lagrange duality and solve the convex optimization problem by the CVX solver. The experimental results demonstrate that the total delay of the energy harvesting WSNs with interference channels is more than that in the orthogonal channel; the total network delay increases with the increasing data flow for the fixed energy arrival rate; and the energy transfer can help to decrease the total delay.

Hybridization of decomposition and local search for multiobjective optimization.

Ke, Liangjun; Zhang, Qingfu; Battiti, Roberto.

IEEE Trans Cybern ; 44(10): 1808-20, 2014 Oct.

Article in English | MEDLINE | ID: mdl-25222724

ABSTRACT

Combining ideas from evolutionary algorithms, decomposition approaches, and Pareto local search, this paper suggests a simple yet efficient memetic algorithm for combinatorial multiobjective optimization problems: memetic algorithm based on decomposition (MOMAD). It decomposes a combinatorial multiobjective problem into a number of single objective optimization problems using an aggregation method. MOMAD evolves three populations: 1) population P(L) for recording the current solution to each subproblem; 2) population P(P) for storing starting solutions for Pareto local search; and 3) an external population P(E) for maintaining all the nondominated solutions found so far during the search. A problem-specific single objective heuristic can be applied to these subproblems to initialize the three populations. At each generation, a Pareto local search method is first applied to search a neighborhood of each solution in P(P) to update P(L) and P(E). Then a single objective local search is applied to each perturbed solution in P(L) for improving P(L) and P(E), and reinitializing P(P). The procedure is repeated until a stopping condition is met. MOMAD provides a generic hybrid multiobjective algorithmic framework in which problem specific knowledge, well developed single objective local search and heuristics and Pareto local search methods can be hybridized. It is a population based iterative method and thus an anytime algorithm. Extensive experiments have been conducted in this paper to study MOMAD and compare it with some other state-of-the-art algorithms on the multiobjective traveling salesman problem and the multiobjective knapsack problem. The experimental results show that our proposed algorithm outperforms or performs similarly to the best so far heuristics on these two problems.

MOEA/D-ACO: a multiobjective evolutionary algorithm using decomposition and AntColony.

Ke, Liangjun; Zhang, Qingfu; Battiti, Roberto.

IEEE Trans Cybern ; 43(6): 1845-59, 2013 Dec.

Article in English | MEDLINE | ID: mdl-23757576

ABSTRACT

Combining ant colony optimization (ACO) and the multiobjective evolutionary algorithm (EA) based on decomposition (MOEA/D), this paper proposes a multiobjective EA, i.e., MOEA/D-ACO. Following other MOEA/D-like algorithms, MOEA/D-ACO decomposes a multiobjective optimization problem into a number of single-objective optimization problems. Each ant (i.e., agent) is responsible for solving one subproblem. All the ants are divided into a few groups, and each ant has several neighboring ants. An ant group maintains a pheromone matrix, and an individual ant has a heuristic information matrix. During the search, each ant also records the best solution found so far for its subproblem. To construct a new solution, an ant combines information from its group's pheromone matrix, its own heuristic information matrix, and its current solution. An ant checks the new solutions constructed by itself and its neighbors, and updates its current solution if it has found a better one in terms of its own objective. Extensive experiments have been conducted in this paper to study and compare MOEA/D-ACO with other algorithms on two sets of test problems. On the multiobjective 0-1 knapsack problem,MOEA/D-ACO outperforms the MOEA/D with conventional genetic operators and local search on all the nine test instances. We also demonstrate that the heuristic information matrices in MOEA/D-ACO are crucial to the good performance of MOEA/D-ACO for the knapsack problem. On the biobjective traveling salesman problem, MOEA/D-ACO performs much better than the BicriterionAnt on all the 12 test instances. We also evaluate the effects of grouping, neighborhood, and the location information of current solutions on the performance of MOEA/D-ACO. The work in this paper shows that reactive search optimization scheme, i.e., the "learning while optimizing" principle, is effective in improving multiobjective optimization algorithms.

Subject(s)

Algorithms , Ants/physiology , Artificial Intelligence , Biomimetics/methods , Decision Support Techniques , Pattern Recognition, Automated/methods , Animals , Biological Evolution

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL