Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Cybern ; 54(2): 797-810, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37256797

RESUMO

In this article, we propose a way to enhance the learning framework for zero-sum games with dynamics evolving in continuous time. In contrast to the conventional centralized actor-critic learning, a novel cooperative finitely excited learning approach is developed to combine the online recorded data with instantaneous data for efficiency. By using an experience replay technique for each agent and distributed interaction amongst agents, we are able to replace the classical persistent excitation condition with an easy-to-check cooperative excitation condition. This approach also guarantees the consensus of the distributed actor-critic learning on the solution to the Hamilton-Jacobi-Isaacs (HJI) equation. It is shown that both the closed-loop stability of the equilibrium point and convergence to the Nash equilibrium can be guaranteed. Simulation results demonstrate the efficacy of this approach compared to previous methods.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37639410

RESUMO

In this article, we propose RRT-Q X∞ , an online and intermittent kinodynamic motion planning framework for dynamic environments with unknown robot dynamics and unknown disturbances. We leverage RRT X for global path planning and rapid replanning to produce waypoints as a sequence of boundary-value problems (BVPs). For each BVP, we formulate a finite-horizon, continuous-time zero-sum game, where the control input is the minimizer, and the worst case disturbance is the maximizer. We propose a robust intermittent Q-learning controller for waypoint navigation with completely unknown system dynamics, external disturbances, and intermittent control updates. We execute a relaxed persistence of excitation technique to guarantee that the Q-learning controller converges to the optimal controller. We provide rigorous Lyapunov-based proofs to guarantee the closed-loop stability of the equilibrium point. The effectiveness of the proposed RRT-Q X∞ is illustrated with Monte Carlo numerical experiments in numerous dynamic and changing environments.

3.
IEEE Trans Neural Netw Learn Syst ; 34(6): 3124-3134, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34606463

RESUMO

This article presents a novel scheme, namely, an intermittent learning scheme based on Skinner's operant conditioning techniques that approximates the optimal policy while decreasing the usage of the communication buses transferring information. While traditional reinforcement learning schemes continuously evaluate and subsequently improve, every action taken by a specific learning agent based on received reinforcement signals, this form of continuous transmission of reinforcement signals and policy improvement signals can cause overutilization of the system's inherently limited resources. Moreover, the highly complex nature of the operating environment for cyber-physical systems (CPSs) creates a gap for malicious individuals to corrupt the signal transmissions between various components. The proposed schemes will increase uncertainty in the learning rate and the extinction rate of the acquired behavior of the learning agents. In this article, we investigate the use of fixed/variable interval and fixed/variable ratio schedules in CPSs along with their rate of success and loss in their optimal behavior incurred during intermittent learning. Simulation results show the efficacy of the proposed approach.

4.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8467-8481, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35226608

RESUMO

In this article, we propose a computationally and communicationally efficient approach for decision-making in nonequilibrium stochastic games. In particular, due to the inherent complexity of computing Nash equilibria, as well as the innate tendency of agents to choose nonequilibrium strategies, we construct two models of bounded rationality based on recursive reasoning. In the first model, named level- k thinking, each agent assumes that everyone else has a cognitive level immediately lower than theirs and-given such an assumption-chooses their policy to be a best response to them. In the second model, named cognitive hierarchy, each agent conjectures that the rest of the agents have a cognitive level that is lower than theirs, but follows a distribution instead of being deterministic. To explicitly compute the boundedly rational policies, a level-recursive algorithm and a level-paralleled algorithm are constructed, where the latter one can have an overall reduced computational complexity. To further reduce the complexity in the communication layer, modifications of the proposed nonequilibrium strategies are presented, which do not require the action of a boundedly rational agent to be updated at each step of the stochastic game. Simulations are performed that demonstrate our results.

5.
Artigo em Inglês | MEDLINE | ID: mdl-36215376

RESUMO

This article develops a safe pursuit-evasion game for enabling finite-time capture, optimal performance as well as adaptation to an unknown cluttered environment. The pursuit-evasion game is formulated as a zero-sum differential game wherein the pursuer seeks to minimize its relative distance to the target while the evader attempts to maximize it. A critic-only reinforcement learning (RL)-based algorithm is then proposed for learning online and in finite time the pursuit-evasion policies and thus enabling finite-time capture of the evader. Safety is ensured by means of barrier functions associated with the obstacles, which are integrated into the running cost. Using Gaussian processes (GPs), a learning-based mechanism is devised for safely learning the unknown environment. Simulation results illustrate the efficacy of the proposed approach.

6.
Artigo em Inglês | MEDLINE | ID: mdl-35767489

RESUMO

This article proposes a real-time neural network (NN) stochastic filter-based controller on the Lie group of the special orthogonal group [Formula: see text] as a novel approach to the attitude tracking problem. The introduced solution consists of two parts: a filter and a controller. First, an adaptive NN-based stochastic filter is proposed, which estimates attitude components and dynamics using measurements supplied by onboard sensors directly. The filter design accounts for measurement uncertainties inherent to the attitude dynamics, namely, unknown bias and noise corrupting angular velocity measurements. The closed-loop signals of the proposed NN-based stochastic filter have been shown to be semiglobally uniformly ultimately bounded (SGUUB). Second, a novel control law on [Formula: see text] coupled with the proposed estimator is presented. The control law addresses unknown disturbances. In addition, the closed-loop signals of the proposed filter-based controller have been shown to be SGUUB. The proposed approach offers robust tracking performance by supplying the required control signal given data extracted from low-cost inertial measurement units. While the filter-based controller is presented in continuous form, the discrete implementation is also presented. In addition, the unit-quaternion form of the proposed approach is given. The effectiveness and robustness of the proposed filter-based controller are demonstrated using its discrete form and considering low sampling rate, high initialization error, high level of measurement uncertainties, and unknown disturbances.

7.
IEEE Trans Cybern ; 52(12): 13762-13773, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34495864

RESUMO

In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.

8.
IEEE Trans Neural Netw Learn Syst ; 32(1): 405-419, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32203039

RESUMO

We develop a method for obtaining safe initial policies for reinforcement learning via approximate dynamic programming (ADP) techniques for uncertain systems evolving with discrete-time dynamics. We employ the kernelized Lipschitz estimation to learn multiplier matrices that are used in semidefinite programming frameworks for computing admissible initial control policies with provably high probability. Such admissible controllers enable safe initialization and constraint enforcement while providing exponential stability of the equilibrium of the closed-loop system.

9.
IEEE Trans Cybern ; 51(9): 4648-4660, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32735543

RESUMO

In this article, we develop a learning-based secure control framework for cyber-physical systems in the presence of sensor and actuator attacks. Specifically, we use a bank of observer-based estimators to detect the attacks while introducing a threat-detection level function. Under nominal conditions, the system operates with a nominal-feedback controller with the developed attack monitoring process checking the reliance of the measurements. If there exists an attacker injecting attack signals to a subset of the sensors and/or actuators, then the attack mitigation process is triggered and a two-player, zero-sum differential game is formulated with the defender being the minimizer and the attacker being the maximizer. Next, we solve the underlying joint state estimation and attack mitigation problem and learn the secure control policy using a reinforcement-learning-based algorithm. Finally, two illustrative numerical examples are provided to show the efficacy of the proposed framework.

10.
IEEE Trans Neural Netw Learn Syst ; 31(12): 5441-5455, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32054590

RESUMO

In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.

11.
IEEE Trans Cybern ; 50(8): 3752-3765, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31478887

RESUMO

This article develops a novel distributed intermittent control framework with the ultimate goal of reducing the communication burden in containment control of multiagent systems communicating via a directed graph. Agents are assumed to be under disturbance and communicate on a directed graph. Both static and dynamic intermittent protocols are proposed. Intermittent H∞ containment control design is considered to attenuate the effect of the disturbance and the game algebraic Riccati equation (GARE) is employed to design the coupling and feedback gains for both static and dynamic intermittent feedback. A novel scheme is then used to unify continuous, static, and dynamic intermittent containment protocols. Finally, simulation results verify the efficacy of the proposed approach.

12.
IEEE Int Conf Rehabil Robot ; 2019: 682-688, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31374710

RESUMO

This paper presents a compliant, underactuated finger for the development of anthropomorphic robotic and prosthetic hands. The finger achieves both flexion/extension and adduction/abduction on the metacarpophalangeal joint, by using two actuators. The design employs moment arm pulleys to drive the tendon laterally and amplify the abduction motion, while also maintaining the flexion motion. Particular emphasis has been given to the analysis of the mechanism. The proposed finger has been fabricated with the hybrid deposition manufacturing technique and the actuation mechanism's efficiency has been validated with experiments that include the computation of the reachable workspace, the assessment of the exerted forces at the fingertip, the demonstration of the feasible motions, and the presentation of the grasping and manipulation capabilities. The proposed mechanism facilitates the collaboration of the two actuators to increase the exerted finger forces. Moreover, the extended workspace allows the execution of dexterous manipulation tasks.


Assuntos
Dedos/fisiologia , Fenômenos Biomecânicos , Complacência (Medida de Distensibilidade) , Humanos , Articulações/fisiologia , Rotação , Tendões/fisiologia
13.
IEEE Trans Neural Netw Learn Syst ; 30(12): 3803-3817, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30946679

RESUMO

This paper presents an online kinodynamic motion planning algorithmic framework using asymptotically optimal rapidly-exploring random tree (RRT*) and continuous-time Q-learning, which we term as RRT-Q⋆. We formulate a model-free Q-based advantage function and we utilize integral reinforcement learning to develop tuning laws for the online approximation of the optimal cost and the optimal policy of continuous-time linear systems. Moreover, we provide rigorous Lyapunov-based proofs for the stability of the equilibrium point, which results in asymptotic convergence properties. A terminal state evaluation procedure is introduced to facilitate the online implementation. We propose a static obstacle augmentation and a local replanning framework, which are based on topological connectedness, to locally recompute the robot's path and ensure collision-free navigation. We perform simulations and a qualitative comparison to evaluate the efficacy of the proposed methodology.

14.
Front Robot AI ; 6: 47, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-33501063

RESUMO

This paper presents an adaptive actuation mechanism that can be employed for the development of anthropomorphic, dexterous robot hands. The tendon-driven actuation mechanism achieves both flexion/extension and adduction/abduction on the finger's metacarpophalangeal joint using two actuators. Moment arm pulleys are employed to drive the tendon laterally and achieve a simultaneous execution of abduction and flexion motion. Particular emphasis has been given to the modeling and analysis of the actuation mechanism. More specifically, the analysis determines specific values for the design parameters for desired abduction angles. Also, a model for spatial motion is provided that relates the actuation modes with the finger motions. A static balance analysis is performed for the computation of the tendon force at each joint. A model is employed for the computation of the stiffness of the rotational flexure joints. The proposed mechanism has been designed and fabricated with the hybrid deposition manufacturing technique. The efficiency of the mechanism has been validated with experiments that include the assessment of the role of friction, the computation of the reachable workspace, the assessment of the force exertion capabilities, the demonstration of the feasible motions, and the evaluation of the grasping and manipulation capabilities. An anthropomorphic robot hand equipped with the proposed actuation mechanism was also fabricated to evaluate its performance. The proposed mechanism facilitates the collaboration of actuators to increase the exerted forces, improving hand dexterity and allowing the execution of dexterous manipulation tasks.

15.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2042-2062, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29771662

RESUMO

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

16.
IEEE Trans Neural Netw Learn Syst ; 27(11): 2386-2398, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26513810

RESUMO

This paper proposes a control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals. The algorithm is based on an actor/critic framework, where a critic neural network (NN) is used to learn the optimal cost, and an actor NN is used to learn the optimal control policy. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori validated, but this can be relaxed using previously stored data concurrently with current data in the update of the critic NN. A robustifying control term is added to the controller to eliminate the effect of residual errors, leading to the asymptotically stability of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a controlled Van der Pol oscillator and also for a power system plant.

17.
IEEE Trans Syst Man Cybern B Cybern ; 41(1): 14-25, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20350860

RESUMO

Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q -learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.


Assuntos
Algoritmos , Inteligência Artificial , Retroalimentação , Aprendizagem , Cadeias de Markov , Reforço Psicológico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...