Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38889021

RESUMO

This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent's system dynamics. Then, an online model-free inverse Q -learning algorithm is developed to recover the agent's cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm's asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms.

2.
IEEE Trans Cybern ; 54(3): 1695-1707, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37027769

RESUMO

This article studies the trajectory imitation control problem of linear systems suffering external disturbances and develops a data-driven static output feedback (OPFB) control-based inverse reinforcement learning (RL) approach. An Expert-Learner structure is considered where the learner aims to imitate expert's trajectory. Using only measured expert's and learner's own input and output data, the learner computes the policy of the expert by reconstructing its unknown value function weights and thus, imitates its optimally operating trajectory. Three static OPFB inverse RL algorithms are proposed. The first algorithm is a model-based scheme and serves as basis. The second algorithm is a data-driven method using input-state data. The third algorithm is a data-driven method using only input-output data. The stability, convergence, optimality, and robustness are well analyzed. Finally, simulation experiments are conducted to verify the proposed algorithms.

3.
IEEE Trans Cybern ; 54(3): 1391-1402, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37906478

RESUMO

This article proposes a data-efficient model-free reinforcement learning (RL) algorithm using Koopman operators for complex nonlinear systems. A high-dimensional data-driven optimal control of the nonlinear system is developed by lifting it into the linear system model. We use a data-driven model-based RL framework to derive an off-policy Bellman equation. Building upon this equation, we deduce the data-efficient RL algorithm, which does not need a Koopman-built linear system model. This algorithm preserves dynamic information while reducing the required data for optimal control learning. Numerical and theoretical analyses of the Koopman eigenfunctions for dataset truncation are discussed in the proposed model-free data-efficient RL algorithm. We validate our framework on the excitation control of the power system.

4.
IEEE Trans Cybern ; 54(2): 728-738, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38133983

RESUMO

This article addresses the problem of learning the objective function of linear discrete-time systems that use static output-feedback (OPFB) control by designing inverse reinforcement learning (RL) algorithms. Most of the existing inverse RL methods require the availability of states and state-feedback control from the expert or demonstrated system. In contrast, this article considers inverse RL in a more general case where the demonstrated system uses static OPFB control with only input-output measurements available. We first develop a model-based inverse RL algorithm to reconstruct an input-output objective function of a demonstrated discrete-time system using its system dynamics and the OPFB gain. This objective function infers the demonstrations and OPFB gain of the demonstrated system. Then, an input-output Q -function is built for the inverse RL problem upon the state reconstruction technique. Given demonstrated inputs and outputs, a data-driven inverse Q -learning algorithm reconstructs the objective function without the knowledge of the demonstrated system dynamics or the OPFB gain. This algorithm yields unbiased solutions even though exploration noises exist. Convergence properties and the nonunique solution nature of the proposed algorithms are studied. Numerical simulation examples verify the effectiveness of the proposed methods.

5.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4596-4609, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34623278

RESUMO

This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.

6.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2386-2399, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34520364

RESUMO

In inverse reinforcement learning (RL), there are two agents. An expert target agent has a performance cost function and exhibits control and state behaviors to a learner. The learner agent does not know the expert's performance cost function but seeks to reconstruct it by observing the expert's behaviors and tries to imitate these behaviors optimally by its own response. In this article, we formulate an imitation problem where the optimal performance intent of a discrete-time (DT) expert target agent is unknown to a DT Learner agent. Using only the observed expert's behavior trajectory, the learner seeks to determine a cost function that yields the same optimal feedback gain as the expert's, and thus, imitates the optimal response of the expert. We develop an inverse RL approach with a new scheme to solve the behavior imitation problem. The approach consists of a cost function update based on an extension of RL policy iteration and inverse optimal control, and a control policy update based on optimal control. Then, under this scheme, we develop an inverse reinforcement Q-learning algorithm, which is an extension of RL Q-learning. This algorithm does not require any knowledge of agent dynamics. Proofs of stability, convergence, and optimality are given. A key property about the nonunique solution is also shown. Finally, simulation experiments are presented to show the effectiveness of the new approach.

7.
IEEE Trans Cybern ; 53(7): 4555-4566, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36264741

RESUMO

This article considers autonomous systems whose behaviors seek to optimize an objective function. This goes beyond standard applications of condition-based maintenance, which seeks to detect faults or failures in nonoptimizing systems. Normal agents optimize a known accepted objective function, whereas abnormal or misbehaving agents may optimize a renegade objective that does not conform to the accepted one. We provide a unified framework for anomaly detection and correction in optimizing autonomous systems described by differential equations using inverse reinforcement learning (RL). We first define several types of anomalies and false alarms, including noise anomaly, objective function anomaly, intention (control gain) anomaly, abnormal behaviors, noise-anomaly false alarms, and objective false alarms. We then propose model-free inverse RL algorithms to reconstruct the objective functions and intentions for given system behaviors. The inverse RL procedure for anomaly detection and correction has the training phase, detection phase, and correction phase. First, inverse RL in the training phase infers the objective function and intention of the normal behavior system using offline stored data. Second, in the detection phase, inverse RL infers the objective function and intention for online observed test system behaviors using online observation data. They are then compared with that of the nominal system to identify anomalies. Third, correction is executed for the anomalous system to learn the normal objective and intention. Simulations and experiments on a quadrotor unmanned aerial vehicle (UAV) verify the proposed methods.


Assuntos
Aprendizagem , Reforço Psicológico , Algoritmos
8.
Artigo em Inglês | MEDLINE | ID: mdl-36315539

RESUMO

This article studies a distributed minmax strategy for multiplayer games and develops reinforcement learning (RL) algorithms to solve it. The proposed minmax strategy is distributed, in the sense that it finds each player's optimal control policy without knowing all the other players' policies. Each player obtains its distributed control policy by solving a distributed algebraic Riccati equation in a multiplayer noncooperative game. This policy is found against the worst policies of all the other players. We guarantee the existence of distributed minmax solutions and study their L2 and asymptotic stabilities. Under mild conditions, the resulting minmax control policies are shown to improve robust gain and phase margins of multiplayer systems compared to the standard linear-quadratic regulator controller. Distributed minmax solutions are found using both model-based policy iteration and data-driven off-policy RL algorithms. Simulation examples verify the proposed formulation and its computational efficiency over the nondistributed Nash solutions.

9.
Artigo em Inglês | MEDLINE | ID: mdl-35786561

RESUMO

This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.

10.
IEEE Trans Cybern ; 52(6): 5242-5254, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33175689

RESUMO

Consensus-based distributed Kalman filters for estimation with targets have attracted considerable attention. Most of the existing Kalman filters use the average consensus approach, which tends to have a low convergence speed. They also rarely consider the impacts of limited sensing range and target mobility on the information flow topology. In this article, we address these issues by designing a novel distributed Kalman consensus filter (DKCF) with an information-weighted consensus structure for random mobile target estimation in continuous time. A new moving target information-flow topology for the measurement of targets is developed based on the sensors' sensing ranges, targets' random mobility, and local information-weighted neighbors. Novel necessary and sufficient conditions about the convergence of the proposed DKCF are developed. Under these conditions, the estimates of all sensors converge to the consensus values. Simulation and comparative studies show the effectiveness and the superiority of this new DKCF.

11.
IEEE Trans Cybern ; 52(10): 10570-10581, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33877993

RESUMO

This article provides a novel inverse reinforcement learning (RL) algorithm that learns an unknown performance objective function for tracking control. The algorithm combines three steps: 1) an optimal control update; 2) a gradient descent correction step; and 3) an inverse optimal control (IOC) update. The new algorithm clarifies the relation between inverse RL and IOC. It is shown that the reward weight of an unknown performance objective that generates a target control policy may not be unique. We characterize the set of all weights that generate the same target control policy. We develop a model-based algorithm and, further, two model-free algorithms for systems with unknown model information. Finally, simulation experiments are presented to show the effectiveness of the proposed algorithms.


Assuntos
Aprendizagem , Reforço Psicológico , Algoritmos , Simulação por Computador , Recompensa
12.
IEEE Trans Cybern ; 52(12): 13083-13095, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34403352

RESUMO

This article proposes robust inverse Q -learning algorithms for a learner to mimic an expert's states and control inputs in the imitation learning problem. These two agents have different adversarial disturbances. To do the imitation, the learner must reconstruct the unknown expert cost function. The learner only observes the expert's control inputs and uses inverse Q -learning algorithms to reconstruct the unknown expert cost function. The inverse Q -learning algorithms are robust in that they are independent of the system model and allow for the different cost function parameters and disturbances between two agents. We first propose an offline inverse Q -learning algorithm which consists of two iterative learning loops: 1) an inner Q -learning iteration loop and 2) an outer iteration loop based on inverse optimal control. Then, based on this offline algorithm, we further develop an online inverse Q -learning algorithm such that the learner mimics the expert behaviors online with the real-time observation of the expert control inputs. This online computational method has four functional approximators: a critic approximator, two actor approximators, and a state-reward neural network (NN). It simultaneously approximates the parameters of Q -function and the learner state reward online. Convergence and stability proofs are rigorously studied to guarantee the algorithm performance.


Assuntos
Algoritmos , Redes Neurais de Computação , Recompensa
13.
IEEE Trans Cybern ; 52(11): 12479-12490, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34143750

RESUMO

Motivated by the guaranteed stability margins of linear quadratic regulators (LQRs) and standard Kalman filter (KF) in the frequency domain, this article extends these results to the distributed Kalman-consensus filter (DKCF) for distributed estimation in sensor networks. In particular, we study the robustness margins of DKCF in two cases, one of which is based on the direct target observation while the other uses estimates from neighbor sensors in the network. The loop transfer functions of the two cases are established, and gain and phase margin robustness results are derived for both. The robustness margins of DKCF are improved compared to the single-agent KF. Furthermore, as communication topology varies in sensor networks, graph overall coupling strengths change. We also analyze the correlation between overall coupling strengths and the robustness margins of DKCF.

14.
IEEE Trans Neural Netw Learn Syst ; 30(8): 2324-2335, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30561352

RESUMO

In this paper, we propose a new scheme based on neural networks for predicting the packet disordering and sliding mode control (SMC) to stabilize the nonlinear networked control systems (NCSs). It is assumed that the packet disordering is unknown in the NCSs. The stochastic configuration networks (SCNs), which randomly assign the input weights and biases and analytically evaluate the output weights, are designed to solve the problem of unknown packet disordering. A new SMC scheme is developed by integrating the SCNs algorithm to learn and control the system in advance. Specifically, a novel measurement of packet disordering is constructed for the quantization of the packet disordering. In addition, the newest signal principle leads to the existence of stochastic parameters, thereby resulting in a Markovian jumping system. The effectiveness of the proposed approach is verified by some simulation results.

15.
ISA Trans ; 83: 1-12, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30144979

RESUMO

This paper investigates sliding mode control combined with sampling rate control for networked control systems subject to packet disordering via Markov chain prediction. The main objectives of the proposed method are to predict the probability of the occurrence of packet disordering when packet disordering is unknown in the networks, to control sampling rate to restrain heavy packet disordering, and to stabilize the Markovian jump system with variable parameters by sliding mode techniques. Firstly, an argument system with sampling rate and a plant state is established. Then, the Networked control system based on Markov chain probability prediction and statistical analysis of this probability is modeled as a Markovian jump system with two Markov chains. Next, sliding mode controller is designed to stabilize the dynamic Markovian jump system. Finally, experiments are conducted to illustrate the effectiveness and benefits of proposed method.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...