Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Cybern ; 51(8): 4251-4264, 2021 Aug.
Article in English | MEDLINE | ID: mdl-30908269

ABSTRACT

Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches. This paper aims to combine advantages of these two categories to derive an efficient policy evaluation algorithm with O ( n 2 ) per-time-step runtime complexity. The least-squares-based framework is adopted, and the gradient correction is used to improve convergence performance. This paper begins with the revision of a previous O ( n 3 ) batch algorithm, least-squares TD with a gradient correction (LS-TDC) to regularize the parameter vector. Based on the recursive least-squares technique, an O ( n 2 ) counterpart of LS-TDC called RC is proposed. To increase data efficiency, we generalize RC with eligibility traces. An off-policy extension is also proposed based on importance sampling. In addition, the convergence analysis for RC as well as LS-TDC is given. The empirical results in both on-policy and off-policy benchmarks show that RC has a higher estimation accuracy than that of RLSTD and a significantly lower runtime complexity than that of LSTDC.

2.
ISA Trans ; 82: 210-222, 2018 Nov.
Article in English | MEDLINE | ID: mdl-28893383

ABSTRACT

Alleviating the staircase artifacts for variation method and adjusting the regularization parameters adaptively with the characteristics of different regions are two main issues in image restoration regularization process. An adaptive fractional-order total variation l1 regularization (AFOTV-l1) model is proposed, which is resolved by using split Bregman iteration algorithm (SBI) for image estimation. An improved fractional-order differential kernel mask (IFODKM) with an extended degree of freedom (DOF) is proposed, which can preserve more image details and effectively avoid the staircase artifact. With the SBI algorithm adopted in this paper, fast convergence and small errors are achieved. Moreover, a novel regularization parameters adaptive strategy is given. Experimental results, by using the standard image library (SIL), the lung imaging database consortium and image database resource initiative (LIDC-IDRI), demonstrate that the proposed methods have better approximation, robustness and fast convergence performances for image restoration.

3.
IEEE Trans Neural Netw Learn Syst ; 27(4): 771-82, 2016 Apr.
Article in English | MEDLINE | ID: mdl-25955853

ABSTRACT

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.

4.
Evol Comput ; 15(3): 369-98, 2007.
Article in English | MEDLINE | ID: mdl-17705783

ABSTRACT

This paper proposes a graph-based evolutionary algorithm called Genetic Network Programming (GNP). Our goal is to develop GNP, which can deal with dynamic environments efficiently and effectively, based on the distinguished expression ability of the graph (network) structure. The characteristics of GNP are as follows. 1) GNP programs are composed of a number of nodes which execute simple judgment/processing, and these nodes are connected by directed links to each other. 2) The graph structure enables GNP to re-use nodes, thus the structure can be very compact. 3) The node transition of GNP is executed according to its node connections without any terminal nodes, thus the past history of the node transition affects the current node to be used and this characteristic works as an implicit memory function. These structural characteristics are useful for dealing with dynamic environments. Furthermore, we propose an extended algorithm, "GNP with Reinforcement Learning (GNPRL)" which combines evolution and reinforcement learning in order to create effective graph structures and obtain better results in dynamic environments. In this paper, we applied GNP to the problem of determining agents' behavior to evaluate its effectiveness. Tileworld was used as the simulation environment. The results show some advantages for GNP over conventional methods.


Subject(s)
Biological Evolution , Learning , Models, Genetic , Algorithms , Artificial Intelligence , Computer Simulation , Crossing Over, Genetic , Memory , Models, Statistical , Models, Theoretical , Mutation , Neural Networks, Computer , Pattern Recognition, Automated , Reinforcement, Psychology , Time Factors
5.
IEEE Trans Syst Man Cybern B Cybern ; 36(1): 179-93, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16468576

ABSTRACT

Multiagent Systems with Symbiotic Learning and Evolution (Masbiole) has been proposed and studied, which is a new methodology of Multiagent Systems (MAS) based on symbiosis in the ecosystem. Masbiole employs a method of symbiotic learning and evolution where agents can learn or evolve according to their symbiotic relations toward others, i.e., considering the benefits/losses of both itself and an opponent. As a result, Masbiole can escape from Nash Equilibria and obtain better performances than conventional MAS where agents consider only their own benefits. This paper focuses on the evolutionary model of Masbiole, and its characteristics are examined especially with an emphasis on the behaviors of agents obtained by symbiotic evolution. In the simulations, two ideas suitable for the effective analysis of such behaviors are introduced; "Match Type Tile-world (MTT)" and "Genetic Network Programming (GNP)". MTT is a virtual model where tile-world is improved so that agents can behave considering their symbiotic relations. GNP is a newly developed evolutionary computation which has the directed graph type gene structure and enables to analyze the decision making mechanism of agents easily. Simulation results show that Masbiole can obtain various kinds of behaviors and better performances than conventional MAS in MTT by evolution.


Subject(s)
Algorithms , Artificial Intelligence , Biomimetics/methods , Decision Support Techniques , Models, Theoretical , Pattern Recognition, Automated/methods , Symbiosis , Biological Evolution , Computer Simulation
6.
Neural Netw ; 19(4): 487-99, 2006 May.
Article in English | MEDLINE | ID: mdl-16423502

ABSTRACT

The way of propagating and control of stochastic signals through Universal Learning Networks (ULNs) and its applications are proposed. ULNs have been already developed to form a superset of neural networks and have been applied as a universal framework for modeling and control of non-linear large-scale complex systems. However, the ULNs cannot deal with stochastic variables. Deterministic signals can be propagated through a ULN, but the ULN does not provide any stochastic characteristics of the signals propagating through it. The proposed method named Probabilistic Universal Learning Networks (PrULNs) can process stochastic variables and can train network parameters so that the signals behave with the pre-specified stochastic properties. As examples of applications of the proposed method, control and identification of non-linear dynamic systems with noises are studied, and it is shown that the method are useful for dealing with the control and identification of the non-linear stochastic systems contaminated with noises.


Subject(s)
Electronic Data Processing , Learning , Neural Networks, Computer , Stochastic Processes , Artificial Intelligence , Computer Simulation , Humans , Learning/physiology , Models, Statistical , Nonlinear Dynamics
7.
Neural Netw ; 16(10): 1461-81, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14622877

ABSTRACT

In this paper, a functions localized network with branch gates (FLN-bg) is studied, which consists of a basic network and a branch gate network. The branch gate network is used to determine which intermediate nodes of the basic network should be connected to the output node with a gate coefficient ranging from 0 to 1. This determination will adjust the outputs of the intermediate nodes of the basic network depending on the values of the inputs of the network in order to realize a functions localized network. FLN-bg is applied to function approximation problems and a two-spiral problem. The simulation results show that FLN-bg exhibits better performance than conventional neural networks with comparable complexity.


Subject(s)
Artificial Intelligence , Computer Simulation , Feedback , Neural Networks, Computer , Animals , Fuzzy Logic , Humans , Learning , Teaching
SELECTION OF CITATIONS
SEARCH DETAIL
...