Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38857133

ABSTRACT

Off-policy prediction-learning the value function for one policy from data generated while following another policy-is one of the most challenging problems in reinforcement learning. This article makes two main contributions: 1) it empirically studies 11 off-policy prediction learning algorithms with emphasis on their sensitivity to parameters, learning speed, and asymptotic error and 2) based on the empirical results, it proposes two step-size adaptation methods called and that help the algorithm with the lowest error from the experimental study learn faster. Many off-policy prediction learning algorithms have been proposed in the past decade, but it remains unclear which algorithms learn faster than others. In this article, we empirically compare 11 off-policy prediction learning algorithms with linear function approximation on three small tasks: the Collision task, the task, and the task. The Collision task is a small off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. The and tasks are designed such that learning fast in them is challenging. In the Rooms task, the product of importance sampling ratios can be as large as 214 . To control the high variance caused by the product of the importance sampling ratios, step size should be set small, which, in turn, slows down learning. The task is more extreme in that the product of the ratios can become as large as 214 × 25 . The algorithms considered are Off-policy TD, five Gradient-TD algorithms, two Emphatic-TD algorithms, Vtrace, and variants of Tree Backup and ABQ that are applicable to the prediction setting. We found that the algorithms' performance is highly affected by the variance induced by the importance sampling ratios. Tree Backup, Vtrace, and ABTDare not affected by the high variance as much as other algorithms, but they restrict the effective bootstrapping parameter in a way that is too limiting for tasks where high variance is not present. We observed that Emphatic TDtends to have lower asymptotic error than other algorithms but might learn more slowly in some cases. Based on the empirical results, we propose two step-size adaptation algorithms, which we collectively refer to as the Ratchet algorithms, with the same underlying idea: keep the step-size parameter as large as possible and ratchet it down only when necessary to avoid overshoot. We show that the Ratchet algorithms are effective by comparing them with other popular step-size adaptation algorithms, such as the Adam optimizer.

2.
Neural Comput Appl ; 35(23): 16805-16819, 2023.
Article in English | MEDLINE | ID: mdl-37455836

ABSTRACT

In this work, we present a perspective on the role machine intelligence can play in supporting human abilities. In particular, we consider research in rehabilitation technologies such as prosthetic devices, as this domain requires tight coupling between human and machine. Taking an agent-based view of such devices, we propose that human-machine collaborations have a capacity to perform tasks which is a result of the combined agency of the human and the machine. We introduce communicative capital as a resource developed by a human and a machine working together in ongoing interactions. Development of this resource enables the partnership to eventually perform tasks at a capacity greater than either individual could achieve alone. We then examine the benefits and challenges of increasing the agency of prostheses by surveying literature which demonstrates that building communicative resources enables more complex, task-directed interactions. The viewpoint developed in this article extends current thinking on how best to support the functional use of increasingly complex prostheses, and establishes insight toward creating more fruitful interactions between humans and supportive, assistive, and augmentative technologies.

3.
Adapt Behav ; 31(1): 3-19, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36618906

ABSTRACT

We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state representation that summarizes its interaction history. Recurrent neural networks can automatically construct state and learn temporal associations. However, the current training methods are prohibitively expensive for online prediction-continual learning on every time step-which is the focus of this paper. Our proposed problems test the learning capabilities that animals readily exhibit and highlight the limitations of the current recurrent learning methods. While the proposed problems are nontrivial, they are still amenable to extensive testing and analysis in the small-compute regime, thereby enabling researchers to study issues in isolation, ultimately accelerating progress towards scalable online representation learning methods.

4.
J Neural Eng ; 17(3): 036002, 2020 06 02.
Article in English | MEDLINE | ID: mdl-32348970

ABSTRACT

OBJECTIVE: Neuromodulation technologies are increasingly used for improving function after neural injury. To achieve a symbiotic relationship between device and user, the device must augment remaining function, and independently adapt to day-to-day changes in function. The goal of this study was to develop predictive control strategies to produce over-ground walking in a model of hemisection spinal cord injury (SCI) using intraspinal microstimulation (ISMS). APPROACH: Eight cats were anaesthetized and placed in a sling over a walkway. The residual function of a hemisection SCI was mimicked by manually moving one hind-limb through the walking cycle. ISMS targeted motor networks in the lumbosacral enlargement to activate muscles in the other, presumably 'paralyzed' limb, using low levels of current (<130 µA). Four people took turns to move the 'intact' limb, generating four different walking styles. Two control strategies, which used ground reaction force and angular velocity information about the manually moved 'intact' limb to control the timing of the transitions of the 'paralyzed' limb through the step cycle, were compared. The first strategy used thresholds on the raw sensor values to initiate transitions. The second strategy used reinforcement learning and Pavlovian control to learn predictions about the sensor values. Thresholds on the predictions were then used to initiate transitions. MAIN RESULTS: Both control strategies were able to produce alternating, over-ground walking. Transitions based on raw sensor values required manual tuning of thresholds for each person to produce walking, whereas Pavlovian control did not. Learning occurred quickly during walking: predictions of the sensor signals were learned rapidly, initiating correct transitions after ≤4 steps. Pavlovian control was resilient to different walking styles and different cats, and recovered from induced mistakes during walking. SIGNIFICANCE: This work demonstrates, for the first time, that Pavlovian control can augment remaining function and facilitate personalized walking with minimal tuning requirements.


Subject(s)
Spinal Cord Injuries , Walking , Animals , Cats , Extremities , Hindlimb
5.
Front Robot AI ; 5: 79, 2018.
Article in English | MEDLINE | ID: mdl-33500958

ABSTRACT

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time-the time it takes for an agent to react to an observation-also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.

6.
Prosthet Orthot Int ; 40(5): 573-81, 2016 Oct.
Article in English | MEDLINE | ID: mdl-26423106

ABSTRACT

BACKGROUND: Myoelectric prostheses currently used by amputees can be difficult to control. Machine learning, and in particular learned predictions about user intent, could help to reduce the time and cognitive load required by amputees while operating their prosthetic device. OBJECTIVES: The goal of this study was to compare two switching-based methods of controlling a myoelectric arm: non-adaptive (or conventional) control and adaptive control (involving real-time prediction learning). STUDY DESIGN: Case series study. METHODS: We compared non-adaptive and adaptive control in two different experiments. In the first, one amputee and one non-amputee subject controlled a robotic arm to perform a simple task; in the second, three able-bodied subjects controlled a robotic arm to perform a more complex task. For both tasks, we calculated the mean time and total number of switches between robotic arm functions over three trials. RESULTS: Adaptive control significantly decreased the number of switches and total switching time for both tasks compared with the conventional control method. CONCLUSION: Real-time prediction learning was successfully used to improve the control interface of a myoelectric robotic arm during uninterrupted use by an amputee subject and able-bodied subjects. CLINICAL RELEVANCE: Adaptive control using real-time prediction learning has the potential to help decrease both the time and the cognitive load required by amputees in real-world functional situations when using myoelectric prostheses.


Subject(s)
Amputation, Surgical/rehabilitation , Artificial Limbs , Electromyography , Machine Learning , Prosthesis Design , Robotics , Arm , Humans , Task Performance and Analysis
7.
Learn Mem ; 21(11): 585-90, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25320350

ABSTRACT

The present experiment tested whether or not the time course of a conditioned eyeblink response, particularly its duration, would expand and contract, as the magnitude of the conditioned response (CR) changed massively during acquisition, extinction, and reacquisition. The CR duration remained largely constant throughout the experiment, while CR onset and peak time occurred slightly later during extinction. The results suggest that computational models can account for these results by using two layers of plasticity conforming to the sequence of synapses in the cerebellar pathways that mediate eyeblink conditioning.


Subject(s)
Blinking , Conditioning, Eyelid/physiology , Extinction, Psychological/physiology , Learning/physiology , Nictitating Membrane/physiology , Animals , Motor Activity , Rabbits , Time Factors
8.
IEEE Int Conf Rehabil Robot ; 2013: 6650435, 2013 Jun.
Article in English | MEDLINE | ID: mdl-24187253

ABSTRACT

Integrating learned predictions into a prosthetic control system promises to enhance multi-joint prosthesis use by amputees. In this article, we present a preliminary study of different cases where it may be beneficial to use a set of temporally extended predictions--learned and maintained in real time--within an engineered or learned prosthesis controller. Our study demonstrates the first successful combination of actor-critic reinforcement learning with real-time prediction learning. We evaluate this new approach to control learning during the myoelectric operation of a robot limb. Our results suggest that the integration of real-time prediction and control learning may speed control policy acquisition, allow unsupervised adaptation in myoelectric controllers, and facilitate synergies in highly actuated limbs. These experiments also show that temporally extended prediction learning enables anticipatory actuation, opening the way for coordinated motion in assistive robotic devices. Our work therefore provides initial evidence that realtime prediction learning is a practical way to support intuitive joint control in increasingly complex prosthetic systems.


Subject(s)
Joint Prosthesis , Joints/physiology , Humans
9.
Learn Mem ; 20(2): 97-102, 2013 Jan 16.
Article in English | MEDLINE | ID: mdl-23325726

ABSTRACT

Rabbits were classically conditioned using compounds of tone and light conditioned stimuli (CSs) presented with either simultaneous onsets (Experiment 1) or serial onsets (Experiment 2) in a delay conditioning paradigm. Training with the simultaneous compound reduced the likelihood of a conditioned response (CR) to the individual CSs ("mutual overshadowing") but left CR timing unaltered. CR peaks were consistently clustered around the time of unconditioned stimulus (US) delivery. Training with the serial compound (CSA→CSB→US) reduced responding to CSB ("temporal primacy/information effect") but this effect was prevented by prior CSB→US pairings. In both cases, serial compound training altered CR timing. On CSA→CSB test trials, the CRs were accelerated; the CR peaks occurred after CSB onset but well before the time of US delivery. Conversely, CRs on CSB- trials were decelerated; the distribution of CR peaks was variable but centered well after the US. Timing on CSB- trials was at most only slightly accelerated. The results are discussed with respect to processes of generalization and spectral timing applicable to the cerebellar and forebrain pathways in eyeblink preparations.


Subject(s)
Association Learning/physiology , Conditioning, Classical/physiology , Cues , Nictitating Membrane/physiology , Acoustic Stimulation , Animals , Rabbits , Reaction Time/physiology , Time Factors
10.
Learn Behav ; 40(3): 305-19, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22927003

ABSTRACT

The temporal-difference (TD) algorithm from reinforcement learning provides a simple method for incrementally learning predictions of upcoming events. Applied to classical conditioning, TD models suppose that animals learn a real-time prediction of the unconditioned stimulus (US) on the basis of all available conditioned stimuli (CSs). In the TD model, similar to other error-correction models, learning is driven by prediction errors--the difference between the change in US prediction and the actual US. With the TD model, however, learning occurs continuously from moment to moment and is not artificially constrained to occur in trials. Accordingly, a key feature of any TD model is the assumption about the representation of a CS on a moment-to-moment basis. Here, we evaluate the performance of the TD model with a heretofore unexplored range of classical conditioning tasks. To do so, we consider three stimulus representations that vary in their degree of temporal generalization and evaluate how the representation influences the performance of the TD model on these conditioning tasks.


Subject(s)
Conditioning, Classical , Models, Psychological , Algorithms , Animals , Association Learning , Computer Simulation/statistics & numerical data , Reinforcement, Psychology , Time Factors
11.
IEEE Int Conf Rehabil Robot ; 2011: 5975338, 2011.
Article in English | MEDLINE | ID: mdl-22275543

ABSTRACT

As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis.


Subject(s)
Artificial Limbs , Artificial Intelligence , Humans , Models, Theoretical
12.
Learn Mem ; 17(12): 600-4, 2010 Dec.
Article in English | MEDLINE | ID: mdl-21075900

ABSTRACT

Using interstimulus intervals (ISIs) of 125, 250, and 500 msec in trace conditioning of the rabbit nictitating membrane response, the offset times and durations of conditioned responses (CRs) were collected along with onset and peak latencies. All measures were proportional to the ISI, but only onset and peak latencies conformed to the criterion for scalar timing. Regarding the CR's possible protective overlap of the unconditioned stimulus (US), CR duration increased with ISI, while the peak's alignment with the US declined. Implications for models of timing and CR adaptiveness are discussed.


Subject(s)
Conditioning, Classical/physiology , Conditioning, Eyelid/physiology , Nictitating Membrane/physiology , Acoustic Stimulation , Animals , Rabbits , Time Factors
13.
Behav Neurosci ; 123(5): 1095-101, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19824776

ABSTRACT

The present experiment characterized conditioned nictitating membrane (NM) movements as a function of CS duration, using the full range of discernible movements (>.06 mm) rather than movements exceeding a conventional criterion (>.50 mm). The CS-US interval was fixed at 500 ms, while across groups, the duration of the CS was 50 ms (trace), 550 ms (delay), or 1050 ms (extended delay). The delay group showed the highest level of acquisition. When tested with the different CS durations, the delay and extended delay groups showed large reductions in their responses when their CS was shortened to 50 ms, but the trace group maintained its response at all durations. Timing of the conditioned movements appeared similar across all manipulations. The results suggest that the CS has both a fine timing function tied to CS onset and a general predictive function tied to CS duration, both of which may be mediated by cerebellar pathways.


Subject(s)
Conditioning, Eyelid/physiology , Nictitating Membrane/physiology , Reaction Time/physiology , Acoustic Stimulation , Animals , Auditory Perception/physiology , Electric Stimulation , Female , Rabbits
14.
Behav Neurosci ; 123(1): 212-7, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19170446

ABSTRACT

The present experiment was aimed at characterizing the timing of conditioned nictitating membrane (NM) movements as function of the interstimulus interval (ISI) in delay conditioning for rabbits (Oryctolagus cuniculus). Onset latency and peak latency were approximately, but not strictly, scalar for all but the smallest movements (<.10 mm). That is, both the mean and standard deviation of the timing measures increased in proportion to the ISI, but their coefficients of variation (standard deviation/mean) tended to be larger for shorter ISIs. For all ISIs, the absolute timing of the NM movements covaried with magnitude. The smaller movements (approximately, .11-.50 mm) were highly variable, and their peaks tended to occur well after the time of US delivery. The larger movements (>.50 mm) were less variable, and their peaks were better aligned with the time of US delivery. These results are discussed with respect to their implications for current models of timing in eyeblink conditioning.


Subject(s)
Conditioning, Eyelid/physiology , Nictitating Membrane/physiology , Reaction Time/physiology , Acoustic Stimulation/methods , Animals , Female , Psychoacoustics , Rabbits , Time Factors
15.
Neural Comput ; 20(12): 3034-54, 2008 Dec.
Article in English | MEDLINE | ID: mdl-18624657

ABSTRACT

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.


Subject(s)
Dopamine/metabolism , Models, Neurological , Neurons/physiology , Reaction Time/physiology , Reward , Action Potentials/physiology , Algorithms , Animals , Cues , Humans , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...