Search | VHL Regional Portal

Kernel-Based Least Squares Temporal Difference With Gradient Correction.

Song, Tianheng; Li, Dazi; Cao, Liulin; Hirasawa, Kotaro.

IEEE Trans Neural Netw Learn Syst ; 27(4): 771-82, 2016 Apr.

Article in English | MEDLINE | ID: mdl-25955853

ABSTRACT

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL