Search | VHL Regional Portal

Reactive Reinforcement Learning in Asynchronous Environments.

Travnik, Jaden B; Mathewson, Kory W; Sutton, Richard S; Pilarski, Patrick M.

Front Robot AI ; 5: 79, 2018.

Article in English | MEDLINE | ID: mdl-33500958

ABSTRACT

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time-the time it takes for an agent to react to an observation-also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.

Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.

Travnik, Jaden B; Pilarski, Patrick M.

IEEE Int Conf Rehabil Robot ; 2017: 1443-1450, 2017 07.

Article in English | MEDLINE | ID: mdl-28814023

ABSTRACT

Prosthetic devices have advanced in their capabilities and in the number and type of sensors included in their design. As the space of sensorimotor data available to a conventional or machine learning prosthetic control system increases in dimensionality and complexity, it becomes increasingly important that this data be represented in a useful and computationally efficient way. Well structured sensory data allows prosthetic control systems to make informed, appropriate control decisions. In this study, we explore the impact that increased sensorimotor information has on current machine learning prosthetic control approaches. Specifically, we examine the effect that high-dimensional sensory data has on the computation time and prediction performance of a true-online temporal-difference learning prediction method as embedded within a resource-limited upper-limb prosthesis control system. We present results comparing tile coding, the dominant linear representation for real-time prosthetic machine learning, with a newly proposed modification to Kanerva coding that we call selective Kanerva coding. In addition to showing promising results for selective Kanerva coding, our results confirm potential limitations to tile coding as the number of sensory input dimensions increases. To our knowledge, this study is the first to explicitly examine representations for realtime machine learning prosthetic devices in general terms. This work therefore provides an important step towards forming an efficient prosthesis-eye view of the world, wherein prompt and accurate representations of high-dimensional data may be provided to machine learning control systems within artificial limbs and other assistive rehabilitation technologies.

Subject(s)

Artificial Intelligence , Artificial Limbs , Exoskeleton Device , Robotics/instrumentation , Signal Processing, Computer-Assisted/instrumentation , Algorithms , Humans , Machine Learning , Prosthesis Design , Software , Upper Extremity/physiology , Wearable Electronic Devices

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL