Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Publication year range
1.
IEEE Trans Cybern ; 54(6): 3692-3704, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38669164

ABSTRACT

Offline reinforcement learning (offline RL) aims to find task-solving policies from prerecorded datasets without online environment interaction. It is unfortunate that extrapolation errors can cause over-optimistic Q-value estimates when learning with a fixed dataset, limiting the performance of the learned policy. To tackle this issue, this article proposes an offline actor-critic with behavior value regularization (OAC-BVR) method. In the policy evaluation stage, the difference between the Q-function and the value of the behavior policy is considered as the regularization term, driving the learned value function to approach the value of the behavior policy. The convergence of the proposed policy evaluation with behavior value regularization (PE-BVR) and the value function difference are analyzed, respectively. Compared with existing offline actor-critic methods, the proposed OAC-BVR method integrates the value of the behavior policy, thereby simultaneously alleviating over-optimistic Q-value estimates and reducing Q-function bias. Experimental results on the D4RL MuJoCo and Maze2d datasets demonstrate the validity of the proposed PE-BVR and the performance advantage of OAC-BVR over the state-of-the-art offline RL algorithms. The code of OAC-BVR is available at https://github.com/LongyangHuang/OAC-BVR.

2.
Article in English | MEDLINE | ID: mdl-38345962

ABSTRACT

Offline reinforcement learning (RL) aims at learning an optimal policy from a static offline data set, without interacting with the environment. However, the theoretical understanding of the existing offline RL methods needs further studies, among which the conservatism of the learned Q-function and the learned policy is a major issue. In this article, we propose a simple and efficient offline RL with relaxed conservatism (ORL-RC) framework for addressing this concern by learning a Q-function that is close to the true Q-function under the learned policy. The conservatism of learned Q-functions and policies of offline RL methods is analyzed. The analysis results support that the conservatism can lead to policy performance degradation. We establish the convergence results of the proposed ORL-RC, and the bounds of learned Q-functions with and without sampling errors, respectively, suggesting that the gap between the learned Q-function and the true Q-function can be reduced by executing the conservative policy improvement. A practical implementation of ORL-RC is presented and the experimental results on the D4RL benchmark suggest that ORL-RC exhibits superior performance and substantially outperforms existing state-of-the-art offline RL methods.

3.
Article in English | MEDLINE | ID: mdl-37676802

ABSTRACT

In offline actor-critic (AC) algorithms, the distributional shift between the training data and target policy causes optimistic Q value estimates for out-of-distribution (OOD) actions. This leads to learned policies skewed toward OOD actions with falsely high Q values. The existing value-regularized offline AC algorithms address this issue by learning a conservative value function, leading to a performance drop. In this article, we propose a mild policy evaluation (MPE) by constraining the difference between the Q values of actions supported by the target policy and those of actions contained within the offline dataset. The convergence of the proposed MPE, the gap between the learned value function and the true one, and the suboptimality of the offline AC with MPE are analyzed, respectively. A mild offline AC (MOAC) algorithm is developed by integrating MPE into off-policy AC. Compared with existing offline AC algorithms, the value function gap of MOAC is bounded by the existence of sampling errors. Moreover, in the absence of sampling errors, the true state value function can be obtained. Experimental results on the D4RL benchmark dataset demonstrate the effectiveness of MPE and the performance superiority of MOAC compared to the state-of-the-art offline reinforcement learning (RL) algorithms.

4.
Polymers (Basel) ; 15(3)2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36771947

ABSTRACT

A polyimide (PI) molecular model was successfully constructed to compare the performance of PIs with different structures. In detail, the structure of the cross-linked PI resin, the prepolymer melt viscosity, and the glass-transition temperature (Tg) were investigated using molecular simulations. The results indicate that benzene ring and polyene-type cross-linked structures dominate the properties of the PIs. Moreover, the prepolymer melt viscosity simulations show that the 6FDA-APB and the ODPA-APB systems have a low viscosity. The results for the Tg and the distribution dihedral angle reveal that the key factor affecting bond flexibility may be the formation of a new dihedral angle after cross-linking, which affects the Tg. The above results provide an important reference for the design of PIs and have important value from the perspective of improving the efficiency of new product development.

5.
Zhonghua Yi Xue Za Zhi ; 90(2): 96-9, 2010 Jan 12.
Article in Chinese | MEDLINE | ID: mdl-20356490

ABSTRACT

OBJECTIVE: To investigate the correlation between the tumor vascular invasion and the change of cardio-pulmonary exercise function in patients with lung cancer. METHODS: The cardio-pulmonary exercise test was performed in 405 patients with lung cancer (293 with vascular invasion and 112 without). The peak load indices examined included maximal work power (measured value/predicted value, W%), maximal oxygen uptake per weight (VO(2)/kg), anaerobic threshold (AT), maximal oxygen pulse (measured value/predicted value, VO(2)/HR%), maximal minute ventilation (V(E)), maximal breath reserve (BR), maximal breath frequency (BF) and maximal tidal volume during expiration (VTex). RESULTS: (1) W%, VO(2)/kg, AT, VO(2)/HR% of patients with vascular invasion [(73 +/- 18)%, (17 +/- 5) ml * min(-1) * kg(-1), (51 +/- 14)%, (79 +/- 18)% respectively] decreased than those without vascular invasion [(86 +/- 20)%, (19 +/- 5) ml * min(-1) * kg(-1), (55 +/- 14)%, (88 +/- 20)% respectively, all P < 0.01) while BF increased [(32.1 +/- 6.1)/min vs (30.6 +/- 5.1)/min, P < 0.05). (2) The patients were divided according to TNM stage, number, kind of tumor vascular invasion and its relationship with tumor, W%, VO(2)/HR% decreased in the groups of 1-, 2- and >or= 3-vessel invasion versus the control group (P < 0.01), AT decreased in the groups of 1- and >or= 3-vessel invasion versus the control group (P < 0.05, P < 0.01), VO(2)/kg decreased in the groups of 2- and >or= 3-vessel invasion versus the control group (P < 0.05, P < 0.01), VO(2)/kg decreased in the group of >or= 3-vessel invasion versus 1- and 2-vessel invasion (P < 0.05 or P < 0.01), VO(2)/HR% decreased in the group of >or= 3-vessel invasion versus 1-vessel invasion (P < 0.01), VTex decreased in the group of >or= 3-vessel invasion versus the control group and 1-vessel invasion (P < 0.05). There was correlation between VO(2)/HR% and the number of tumor invaded vessels (r = 0.220, P < 0.01). CONCLUSIONS: The amount of oxygen uptake, exercise ability and cardiac function during exercise decrease in patients of lung cancer with tumor vascular invasion. The main reason is the number of the invaded vessels.


Subject(s)
Blood Vessels/pathology , Exercise Test , Heart/physiopathology , Lung Neoplasms/pathology , Lung/physiopathology , Adult , Aged , Female , Humans , Lung Neoplasms/physiopathology , Male , Middle Aged , Neoplasm Invasiveness , Neoplasm Staging
6.
Zhongguo Fei Ai Za Zhi ; 6(5): 367-70, 2003 Oct 20.
Article in Chinese | MEDLINE | ID: mdl-21306681

ABSTRACT

BACKGROUND: To evaluate the clinical significance of predicting post-operative respiratory failure in patients with lung cancer using cardiopulmonary exercise test (CPET). METHODS: Before operation, 260 patients with lung cancer underwent CPET with incremental protocol. W%, VO2%P, VO2/kg, AT, MET, O2 pulse, VTe, BF and VE were measured in the end of load exercise. RESULTS: (1) In patients after pneumonectomy, the values of the above indexes of CPET in the respiratory failure group were significantly lower than those in the non-respiratory failure group ( P < 0.05 or P < 0.01) except VTe. In patients after lobectomy, the values of 9 indexes of CPET in the respiratory failure group were similar to those in the non-respiratory failure group ( P > 0.05). However, when the patients after lobectomy were further divided into groups of upper and lower lobectomy, W% in the respiratory failure group was remarkably lower than that in the non-respiratory failure group after lower lobectomy ( P < 0.05). (2) Chi-Square test showed that abnormality of CPET indexes in different degrees was related to the morbidity of respiratory failure after pneumonectomy. Logistic regression showed that O2 pulse < 80% and BF < 30/min correlated with the morbidity of post-operative respiratory failure. (3) For predicting post-operative respiratory failure, the sensitivity and specificity of VO2%P < 60%, BF < 30/min, VE < 35 L/min were all more than 60% and their negative predictive values were all more than 90%. CONCLUSIONS: CPET is suitable to predict post-pneumonectomy respiratory failure. As a comprehensive index indicating cardiopulmonary function during exercise, VO2%P < 60% should be selected to predict respiratory failure and evaluate indication of lung resection for patients with lung cancer.

7.
Zhongguo Fei Ai Za Zhi ; 5(6): 454-7, 2002 Dec 20.
Article in Chinese | MEDLINE | ID: mdl-21333230

ABSTRACT

BACKGROUND: To explore the characteristics of exercise cardio-pulmonary function and its possible influencing factors in patients with lung cancer. METHODS: The pulmonary function, ECG and exercise cardio-pulmonary function were measured in 198 patients with lung cancer and 20 healthy controls. RESULTS: 1. Compared with healthy group, VO2%P, VO2/kg, AT, VO2/HR%, VE and VT/VC significantly decreased in lung cancer patients with normal resting pulmonary ventilation, however, BR remarkably increased (P < 0.05 or P < 0.01). 2. In patients with normal resting pulmonary ventilation, there was no significant difference of exercise cardio-pulmonary function between the central and peripheral lung cancer groups. 3. The exercise cardio-pulmonary function was closely related to the TNM stages (P < 0.05 or P < 0.01). 4. W%, VO2%P , AT and VO2/HR% in patients with great vessel invasion were remarkably lower than those without great vessel invasion (P < 0.05 or P < 0.01). CONCLUSIONS: The results suggest that exercise ventilation is impaired in lung cancer patients with normal resting ventilation. And the decrease of exercise cardio-pulmonary function may be related to TNM stage and to great vessel involvement.

SELECTION OF CITATIONS
SEARCH DETAIL
...