Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-35731766

ABSTRACT

In the current matrix factorization recommendation approaches, the item and the user latent factor vectors are with the same dimension. Thus, the linear dot product is used as the interactive function between the user and the item to predict the ratings. However, the relationship between real users and items is not entirely linear and the existing recommendation model of matrix factorization faces the challenge of data sparsity. To this end, we propose a kernelized deep neural network recommendation model in this article. First, we encode the explicit user-item rating matrix in the form of column vectors and project them to higher dimensions to facilitate the simulation of nonlinear user-item interaction for enhancing the connection between users and items. Second, the algorithm of association rules is used to mine the implicit relation between users and items, rather than simple feature extraction of users or items, for improving the recommendation performance when the datasets are sparse. Third, through the autoencoder and kernelized network processing, the implicit data are connected with the explicit data by the multilayer perceptron network for iterative training instead of doing simple linear weighted summation. Finally, the predicted rating is output through the hidden layer. Extensive experiments were conducted on four public datasets in comparison with several existing well-known methods. The experimental results indicated that our proposed method has obtained improved performance in data sparsity and prediction accuracy.

2.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1584-1593, 2022 04.
Article in English | MEDLINE | ID: mdl-33351767

ABSTRACT

In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios.


Subject(s)
Learning , Neural Networks, Computer , Algorithms , Policy , Reward
3.
IEEE Trans Cybern ; 51(5): 2419-2432, 2021 May.
Article in English | MEDLINE | ID: mdl-31329149

ABSTRACT

In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton-Jacobi-Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor-critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.

4.
IEEE Trans Cybern ; 49(11): 3911-3922, 2019 Nov.
Article in English | MEDLINE | ID: mdl-30059327

ABSTRACT

The adaptive dynamic programming controller usually needs a long training period because the data usage efficiency is relatively low by discarding the samples once used. Prioritized experience replay (ER) promotes important experiences and is more efficient in learning the control process. This paper proposes integrating an efficient learning capability of prioritized ER design into heuristic dynamic programming (HDP). First, a one time-step backward state-action pair is used to design the ER tuple and, thus, avoids the model network. Second, a systematic approach is proposed to integrate the ER into both critic and action networks of HDP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial weight parameters and initial starting states for both traditional HDP and the proposed approach under the same simulation environment. The proposed approach improves the required average number of trials to succeed by 60.56% for cart-pole, and 56.89% for triple-link balancing tasks, in comparison with the traditional HDP approach. Also, we have added results of ER-based HDP for comparison. Moreover, theoretical convergence analysis is presented to guarantee the stability of the proposed control design.

5.
IEEE Trans Cybern ; 48(5): 1633-1646, 2018 May.
Article in English | MEDLINE | ID: mdl-28727566

ABSTRACT

In this paper, we present a new model-free globalized dual heuristic dynamic programming (GDHP) approach for the discrete-time nonlinear zero-sum game problems. First, the online learning algorithm is proposed based on the GDHP method to solve the Hamilton-Jacobi-Isaacs equation associated with optimal regulation control problem. By setting backward one step of the definition of performance index, the requirement of system dynamics, or an identifier is relaxed in the proposed method. Then, three neural networks are established to approximate the optimal saddle point feedback control law, the disturbance law, and the performance index, respectively. The explicit updating rules for these three neural networks are provided based on the data generated during the online learning along the system trajectories. The stability analysis in terms of the neural network approximation errors is discussed based on the Lyapunov approach. Finally, two simulation examples are provided to show the effectiveness of the proposed method.

6.
IEEE Trans Neural Netw Learn Syst ; 28(8): 1941-1952, 2017 08.
Article in English | MEDLINE | ID: mdl-28113603

ABSTRACT

In this paper, an event-triggered near optimal control structure is developed for nonlinear continuous-time systems with control constraints. Due to the saturating actuators, a nonquadratic cost function is introduced and the Hamilton-Jacobi-Bellman (HJB) equation for constrained nonlinear continuous-time systems is formulated. In order to solve the HJB equation, an actor-critic framework is presented. The critic network is used to approximate the cost function and the action network is used to estimate the optimal control law. In addition, in the proposed method, the control signal is transmitted in an aperiodic manner to reduce the computational and the transmission cost. Both the networks are only updated at the trigger instants decided by the event-triggered condition. Detailed Lyapunov analysis is provided to guarantee that the closed-loop event-triggered system is ultimately bounded. Three case studies are used to demonstrate the effectiveness of the proposed method.

7.
IEEE Trans Cybern ; 47(10): 3318-3330, 2017 Oct.
Article in English | MEDLINE | ID: mdl-27662693

ABSTRACT

Goal representation globalized dual heuristic dynamic programming (Gr-GDHP) method is proposed in this paper. A goal neural network is integrated into the traditional GDHP method providing an internal reinforcement signal and its derivatives to help the control and learning process. From the proposed architecture, it is shown that the obtained internal reinforcement signal and its derivatives can be able to adjust themselves online over time rather than a fixed or predefined function in literature. Furthermore, the obtained derivatives can directly contribute to the objective function of the critic network, whose learning process is thus simplified. Numerical simulation studies are applied to show the performance of the proposed Gr-GDHP method and compare the results with other existing adaptive dynamic programming designs. We also investigate this method on a ball-and-beam balancing system. The statistical simulation results are presented for both the Gr-GDHP and the GDHP methods to demonstrate the improved learning and controlling performance.

8.
IEEE Trans Neural Netw Learn Syst ; 28(7): 1594-1605, 2017 07.
Article in English | MEDLINE | ID: mdl-27071197

ABSTRACT

This paper presents the design of a novel adaptive event-triggered control method based on the heuristic dynamic programming (HDP) technique for nonlinear discrete-time systems with unknown system dynamics. In the proposed method, the control law is only updated when the event-triggered condition is violated. Compared with the periodic updates in the traditional adaptive dynamic programming (ADP) control, the proposed method can reduce the computation and transmission cost. An actor-critic framework is used to learn the optimal event-triggered control law and the value function. Furthermore, a model network is designed to estimate the system state vector. The main contribution of this paper is to design a new trigger threshold for discrete-time systems. A detailed Lyapunov stability analysis shows that our proposed event-triggered controller can asymptotically stabilize the discrete-time systems. Finally, we test our method on two different discrete-time systems, and the simulation results are included.

9.
IEEE Trans Neural Netw Learn Syst ; 27(12): 2513-2525, 2016 12.
Article in English | MEDLINE | ID: mdl-26571538

ABSTRACT

Goal representation heuristic dynamic programming (GrHDP) control design has been developed in recent years. The control performance of this design has been demonstrated in several case studies, and also showed applicable to industrial-scale complex control problems. In this paper, we develop the theoretical analysis for the GrHDP design under certain conditions. It has been shown that the internal reinforcement signal is a bounded signal and the performance index can converge to its optimal value monotonically. The existence of the admissible control is also proved. Although the GrHDP control method has been investigated in many areas before, to the best of our knowledge, this is the first study of presenting the theoretical foundation of the internal reinforcement signal and how such an internal reinforcement signal can provide effective information to improve the control performance. Numerous simulation studies are used to validate the theoretical analysis and also demonstrate the effectiveness of the GrHDP design.

10.
IEEE Trans Neural Netw Learn Syst ; 26(8): 1834-9, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25955997

ABSTRACT

Model-based dual heuristic dynamic programming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.

11.
IEEE Trans Neural Netw Learn Syst ; 25(12): 2141-55, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25420238

ABSTRACT

In this paper, we develop and analyze an optimal control method for a class of discrete-time nonlinear Markov jump systems (MJSs) with unknown system dynamics. Specifically, an identifier is established for the unknown systems to approximate system states, and an optimal control approach for nonlinear MJSs is developed to solve the Hamilton-Jacobi-Bellman equation based on the adaptive dynamic programming technique. We also develop detailed stability analysis of the control approach, including the convergence of the performance index function for nonlinear MJSs and the existence of the corresponding admissible control. Neural network techniques are used to approximate the proposed performance index function and the control law. To demonstrate the effectiveness of our approach, three simulation studies, one linear case, one nonlinear case, and one single link robot arm case, are used to validate the performance of the proposed optimal control method.

SELECTION OF CITATIONS
SEARCH DETAIL
...