Search | VHL Regional Portal

ACERAC: Efficient Reinforcement Learning in Fine Time Discretization.

Lyskawa, Jakub; Wawrzynski, Pawel.

IEEE Trans Neural Netw Learn Syst ; PP2022 Jul 20.

Article in English | MEDLINE | ID: mdl-35857727

ABSTRACT

One of the main goals of reinforcement learning (RL) is to provide a way for physical machines to learn optimal behavior instead of being programmed. However, effective control of the machines usually requires fine time discretization. The most common RL methods apply independent random elements to each action, which is not suitable in that setting. It is not feasible because it causes the controlled system to jerk and does not ensure sufficient exploration since a single action is not long enough to create a significant experience that could be translated into policy improvement. In our view, these are the main obstacles that prevent the application of RL in contemporary control systems. To address these pitfalls, in this article, we introduce an RL framework and adequate analytical tools for actions that may be stochastically dependent in subsequent time instances. We also introduce an RL algorithm that approximately optimizes a policy that produces such actions. It applies experience replay (ER) to adjust the likelihood of sequences of previous actions to optimize expected n -step returns that the policy yields. The efficiency of this algorithm is verified against four other RL methods continuous deep advantage updating (CDAU), proximal policy optimization (PPO), soft actor-critic (SAC), and actor-critic with ER (ACER) in four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) in diverse time discretization. The algorithm introduced here outperforms the competitors in most cases considered.

ASD+M: Automatic parameter tuning in stochastic optimization and on-line learning.

Wawrzynski, Pawel.

Neural Netw ; 96: 1-10, 2017 Dec.

Article in English | MEDLINE | ID: mdl-28950104

ABSTRACT

In this paper the classic momentum algorithm for stochastic optimization is considered. A method is introduced that adjusts coefficients for this algorithm during its operation. The method does not depend on any preliminary knowledge of the optimization problem. In the experimental study, the method is applied to on-line learning in feed-forward neural networks, including deep auto-encoders, and outperforms any fixed coefficients. The method eliminates coefficients that are difficult to determine, with profound influence on performance. While the method itself has some coefficients, they are ease to determine and sensitivity of performance to them is low. Consequently, the method makes on-line learning a practically parameter-free process and broadens the area of potential application of this technology.

Subject(s)

Algorithms , Machine Learning , Neural Networks, Computer , Learning , Machine Learning/trends , Stochastic Processes

Autonomous reinforcement learning with experience replay.

Wawrzynski, Pawel; Tanwani, Ajay Kumar.

Neural Netw ; 41: 156-67, 2013 May.

Article in English | MEDLINE | ID: mdl-23237972

ABSTRACT

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.

Subject(s)

Algorithms , Artificial Intelligence , Models, Theoretical , Neural Networks, Computer , Reinforcement, Psychology , Animals , Computer Simulation , Markov Chains , Movement , Octopodiformes , Problem Solving , Stochastic Processes

Real-time reinforcement learning by sequential Actor-Critics and experience replay.

Wawrzynski, Pawel.

Neural Netw ; 22(10): 1484-97, 2009 Dec.

Article in English | MEDLINE | ID: mdl-19523786

ABSTRACT

Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. This paper shows how these algorithms can be augmented by the technique of experience replay without degrading their convergence properties, by appropriately estimating the policy change direction. This is achieved by truncated importance sampling applied to the recorded past experiences. It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. The technique of experience replay makes it possible to utilize the available computational power to reduce the required number of interactions with the environment considerably, which is essential for real-world applications. Experimental results are presented that demonstrate that the combination of experience replay and Actor-Critics yields extremely fast learning algorithms that achieve successful policies for non-trivial control tasks in considerably short time. Namely, the policies for the cart-pole swing-up [Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219-245] are obtained after as little as 20 min of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot) is obtained after four hours of Half-Cheetah time.

Subject(s)

Artificial Intelligence , Reinforcement, Psychology , Algorithms , Animals , Biomechanical Phenomena , Cats , Data Interpretation, Statistical , Joints/physiology , Robotics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL