Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters











Publication year range
1.
Sensors (Basel) ; 24(13)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-39001102

ABSTRACT

Visible light communication (VLC) is a promising complementary technology to its radio frequency (RF) counterpart to satisfy the high quality-of-service (QoS) requirements of intelligent vehicular communications by reusing LED street lights. In this paper, a hybrid handover scheme for vehicular VLC/RF communication networks is proposed to balance QoS and handover costs by considering the vertical handover and horizontal handover together judging from the mobile state of the vehicle. A Markov decision process (MDP) is formulated to describe this hybrid handover problem, with a cost function balancing the handover consumption, delay, and reliability. A value iteration algorithm was applied to solve the optimal handover policy. The simulation results demonstrated the performance of the proposed hybrid handover scheme in comparison to other benchmark schemes.

2.
Neural Netw ; 176: 106364, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38754288

ABSTRACT

In practical industrial processes, the receding optimization solution of nonlinear model predictive control (NMPC) is always a very knotty problem. Based on adaptive dynamic programming, the accelerated value iteration predictive control (AVI-PC) algorithm is developed in this paper. Integrating iteration learning with the receding horizon mechanism of NMPC, a novel receding optimization solution pattern is exploited to resolve the optimal control law in each prediction horizon. Besides, the basic architecture and the specific form of the AVI-PC algorithm are demonstrated, including the relationship among the iterative learning process, the prediction process, and the control process. On this basis, the convergence and admissibility conditions are established, and the relevant properties are comprehensively analyzed when the accelerated factor satisfies the established conditions. Furthermore, the accelerated value iterative function is approximated through the single critic network constructed by utilizing the multiple linear regression method. Finally, the plentiful simulation experiments are conducted from various perspectives to verify the effectiveness and progressiveness of the AVI-PC algorithm.


Subject(s)
Algorithms , Neural Networks, Computer , Nonlinear Dynamics , Computer Simulation , Humans , Machine Learning
3.
Neural Netw ; 167: 751-762, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37729789

ABSTRACT

In this paper, a novel parallel learning framework is developed to solve zero-sum games for discrete-time nonlinear systems. Briefly, the purpose of this study is to determine a tentative function according to the prior knowledge of the value iteration (VI) algorithm. The learning process of the parallel controllers can be guided by the tentative function. That is to say, the neighborhood of the optimal cost function can be compressed within a small range via two typical exploration policies. Based on the parallel learning framework, a novel dichotomy VI algorithm is established to accelerate the learning speed. It is shown that the parallel controllers will converge to the optimal policy from contrary initial policies. Finally, two typical systems are used to demonstrate the learning performance of the constructed dichotomy VI algorithm.


Subject(s)
Algorithms , Nonlinear Dynamics , Computer Simulation , Learning
4.
Neural Netw ; 166: 437-445, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37566954

ABSTRACT

The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data.


Subject(s)
Algorithms , Reinforcement, Psychology , Learning
5.
Biomimetics (Basel) ; 8(4)2023 Aug 09.
Article in English | MEDLINE | ID: mdl-37622958

ABSTRACT

The utilization of lower extremity exoskeletons has witnessed a growing presence across diverse domains such as the military, medical treatment, and rehabilitation. This paper introduces a novel design of a lower extremity exoskeleton specifically tailored for individuals engaged in heavy object carrying tasks. The exoskeleton incorporates an impressive 12 degrees of freedom (DOF), with four of them being effectively controlled through hydraulic cylinders. To achieve optimal control of this intricate lower extremity exoskeleton system, the authors propose an adaptive dynamic programming (ADP) algorithm. Several crucial components are established to implement this control scheme. These include the formulation of the state equation for the lower extremity exoskeleton system, which is well-suited for the ADP algorithm. Additionally, a corresponding performance index function based on the tracking error is devised, along with the game algebraic Riccati equation. By employing the value iteration ADP scheme, the lower extremity exoskeleton demonstrates highly effective tracking control. This research not only highlights the potential of the proposed control approach but also showcases its ability to enhance the overall performance and functionality of lower extremity exoskeletons, particularly in scenarios involving heavy object carrying. Overall, this study contributes to the advancement of lower extremity exoskeleton technology and offers valuable insights into the application of ADP algorithms for achieving precise and efficient control in demanding tasks.

6.
Wirel Pers Commun ; : 1-23, 2023 May 04.
Article in English | MEDLINE | ID: mdl-37360139

ABSTRACT

This work proposes a stochastic model of the coordinator units of each wireless body area network (WBAN) in a multi-WBAN scenario. In a Smart Home environment, multiple patients can come into the vicinity of each other while each of them is wearing a WBAN configuration for monitoring body vitals. Thus, while multiple WBANs coexist, the individual WBAN coordinators require adaptive transmission strategies in order to balance between maximizing the likelihood of data transmission and minimizing the chances of packet loss due to inter-BAN interference. Accordingly, the proposed work is divided into two phases. In the offline phase, each WBAN coordinator is modeled stochastically and the problem of their transmission strategy has been modeled as a Markov Decision Process(MDP). The channel conditions and buffer status that influence the transmission decision are taken to be the state parameters in MDP. The formulation is solved offline, prior to deployment of the network to find out the optimal transmission strategies for various input conditions. Such transmission policies for inter-WBAN communication are then incorporated into the coordinator nodes in the post-deployment phase. The work is simulated using Castalia and the results demonstrate the robustness of the proposed scheme in handling both favorable and unfavorable operating conditions.

7.
Neural Netw ; 154: 131-140, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35882081

ABSTRACT

In this paper, a critic learning structure based on the novel utility function is developed to solve the optimal tracking control problem with the discount factor of affine nonlinear systems. The utility function is defined as the quadratic form of the error at the next moment, which can not only avoid solving the stable control input, but also effectively eliminate the tracking error. Next, the theoretical derivation of the method under value iteration is given in detail with convergence and stability analysis. Then, the dual heuristic dynamic programming (DHP) algorithm via a single neural network is introduced to reduce the amount of computation. The polynomial is used to approximate the costate function during the DHP implementation. The weighted residual method is used to update the weight matrix. During simulation, the convergence speed of the given strategy is compared with the heuristic dynamic programming (HDP) algorithm. The experiment results display that the convergence speed of the proposed method is faster than the HDP algorithm. Besides, the proposed method is compared with the traditional tracking control approach to verify its tracking performance. The experiment results show that the proposed method can avoid solving the stable control input, and the tracking error is closer to zero than the traditional strategy.


Subject(s)
Neural Networks, Computer , Nonlinear Dynamics , Algorithms , Computer Simulation , Learning
8.
Sensors (Basel) ; 21(24)2021 Dec 16.
Article in English | MEDLINE | ID: mdl-34960508

ABSTRACT

Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on the value iteration (VI) algorithm, and has emerged as an effective method to learn to plan. Despite the capability of learning environment dynamics and performing long-range reasoning, the VIN suffers from several limitations, including sensitivity to initialization and poor performance in large-scale domains. We introduce the double value iteration network (dVIN), which decouples action selection and value estimation in the VI module, using the weighted double estimator method to approximate the maximum expected value, instead of maximizing over the estimated action value. We have devised a simple, yet effective, two-stage training strategy for VI-based models to address the problem of high computational cost and poor performance in large-size domains. We evaluate the dVIN on planning problems in grid-world domains and realistic datasets, generated from terrain images of a moon landscape. We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.


Subject(s)
Algorithms
9.
Neural Netw ; 144: 176-186, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34500256

ABSTRACT

A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.


Subject(s)
Neural Networks, Computer , Nonlinear Dynamics , Algorithms , Computer Simulation , Learning
10.
Neural Netw ; 124: 280-295, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32036226

ABSTRACT

In this paper, a novel value iteration adaptive dynamic programming (ADP) algorithm is presented, which is called an improved value iteration ADP algorithm, to obtain the optimal policy for discrete stochastic processes. In the improved value iteration ADP algorithm, for the first time we propose a new criteria to verify whether the obtained policy is stable or not for stochastic processes. By analyzing the convergence properties of the proposed algorithm, it is shown that the iterative value functions can converge to the optimum. In addition, our algorithm allows the initial value function to be an arbitrary positive semi-definite function. Finally, two simulation examples are presented to validate the effectiveness of the developed method.


Subject(s)
Neural Networks, Computer , Nonlinear Dynamics , Stochastic Processes , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL