Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Sci Rep ; 14(1): 15103, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38956201

ABSTRACT

One of the long-term goals of reinforcement learning is to build intelligent agents capable of rapidly learning and flexibly transferring skills, similar to humans and animals. In this paper, we introduce an episodic control framework based on the temporal expansion of subsequent features to achieve these goals, which we refer to as Temporally Extended Successor Feature Neural Episodic Control (TESFNEC). This method has shown impressive results in significantly improving sample efficiency and elegantly reusing previously learned strategies. Crucially, this model enhances agent training by incorporating episodic memory, significantly reducing the number of iterations required to learn the optimal policy. Furthermore, we adopt the temporal expansion of successor features a technique to capture the expected state transition dynamics of actions. This form of temporal abstraction does not entail learning a top-down hierarchy of task structures but focuses on the bottom-up combination of actions and action repetitions. Thus, our approach directly considers the temporal scope of sequences of temporally extended actions without requiring predefined or domain-specific options. Experimental results in the two-dimensional object collection environment demonstrate that the method proposed in this paper optimizes learning policies faster than baseline reinforcement learning approaches, leading to higher average returns.

2.
Sensors (Basel) ; 24(14)2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39065911

ABSTRACT

Visual reinforcement learning is important in various practical applications, such as video games, robotic manipulation, and autonomous navigation. However, a major challenge in visual reinforcement learning is the generalization to unseen environments, that is, how agents manage environments with previously unseen backgrounds. This issue is triggered mainly by the high unpredictability inherent in high-dimensional observation space. To deal with this problem, techniques including domain randomization and data augmentation have been explored; nevertheless, these methods still cannot attain a satisfactory result. This paper proposes a new method named Internal States Simulation Auxiliary (ISSA), which uses internal states to improve generalization in visual reinforcement learning tasks. Our method contains two agents, a teacher agent and a student agent: the teacher agent has the ability to directly access the environment's internal states and is used to facilitate the student agent's training; the student agent receives initial guidance from the teacher agent and subsequently continues to learn independently. From another perspective, our method can be divided into two phases, the transfer learning phase and traditional visual reinforcement learning phase. In the first phase, the teacher agent interacts with environments and imparts knowledge to the vision-based student agent. With the guidance of the teacher agent, the student agent is able to discover more effective visual representations that address the high unpredictability of high-dimensional observation space. In the next phase, the student agent autonomously learns from the visual information in the environment, and ultimately, it becomes a vision-based reinforcement learning agent with enhanced generalization. The effectiveness of our method is evaluated using the DMControl Generalization Benchmark and the DrawerWorld with texture distortions. Preliminary results indicate that our method significantly improves generalization ability and performance in complex continuous control tasks.

3.
Neural Netw ; 176: 106342, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38692188

ABSTRACT

Reinforcement Learning (RL) is a significant machine learning subfield that emphasizes learning actions based on environment to obtain optimal behavior policy. RL agents can make decisions at variable time scales in the form of temporal abstractions, also known as options. The issue of discovering options has seen a considerable research effort. Most notably, the Interest Option Critic (IOC) algorithm first extends the initial set to the interest function, providing a method for learning options specialized to certain state space regions. This approach offers a specific attention mechanism for action selection. Unfortunately, this method still suffers from the classic issues of poor data efficiency and lack of flexibility in RL when learning options end-to-end through backpropagation. This paper proposes a new approach called Salience Interest Option Critic (SIOC), which chooses subsets of existing initiation sets for RL. Specifically, these subsets are not learned by backpropagation, which is slow and tends to overfit, but through particle filters. This approach enables the rapid and flexible identification of critical subsets using only reward feedback. We conducted experiments in discrete and continuous domains, and our proposed method demonstrate higher efficiency and flexibility than other methods. The generated options are more valuable within a single task and exhibited greater interpretability and reusability in multi-task learning scenarios.


Subject(s)
Algorithms , Machine Learning , Neural Networks, Computer , Reinforcement, Psychology , Humans , Reward , Decision Making/physiology , Time Factors
4.
Sci Rep ; 13(1): 5157, 2023 Mar 29.
Article in English | MEDLINE | ID: mdl-36991061

ABSTRACT

Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method.

5.
Antibiotics (Basel) ; 10(2)2021 Feb 01.
Article in English | MEDLINE | ID: mdl-33535705

ABSTRACT

Carbapenem-resistant Klebsiella pneumoniae (CRKP), one of the major nosocomial pathogens, is increasingly becoming a serious threat to global public health. There is an urgent need to develop effective therapeutic and preventive approaches to combat the pathogen. Here, we identified and characterized a novel capsule depolymerase (K64-ORF41) derived from Klebsiella phage SH-KP152410, which showed specific activities for K. pneumoniae K64-serotype. We showed that this depolymerase could be used in the identification of K64 serotypes based on the capsular typing, and the results agreed well with those from the conventional serotyping method using antisera. From this study, we also identified K64 mutant strains, which showed typing discrepancy between wzi-sequencing based genotyping and depolymerase-based or antiserum-based typing methods. Further investigation indicated that the mutant strain has an insertion sequence (IS) in wcaJ, which led to the alteration of the capsular serotype structure. We further demonstrated that K64-ORF41 depolymerase could sensitize the bacteria to serum or neutrophil killing by degrading the capsular polysaccharide. In summary, the identified K64 depolymerase proves to be an accurate and reliable tool for capsular typing, which will facilitate the preventive intervention such as vaccine development. In addition, the polymerase may represent a potential and promising therapeutic biologics against CRKP-K64 infections.

6.
Front Microbiol ; 10: 2768, 2019.
Article in English | MEDLINE | ID: mdl-31849905

ABSTRACT

The increasing prevalence of infections caused by multidrug-resistant Klebsiella pneumoniae necessitates the development of alternative therapies. Here, we isolated, characterized, and sequenced a K. pneumoniae bacteriophage (SH-KP152226) that specifically infects and lyses K. pneumoniae capsular type K47. The phage SH-KP152226 contains a genome of 41,420 bp that encodes 48 predicted proteins. Among these proteins, Dep42, the gene product of ORF42, is a putative tail fiber protein and hypothetically possesses depolymerase activity. We demonstrated that recombinant Dep42 showed specific enzymatic activities in the depolymerization of the K47 capsule of K. pneumoniae and was able to significantly inhibit biofilm formation and/or degrade formed biofilms. We also showed that Dep42 could enhance polymyxin activity against K. pneumoniae biofilms when used in combination with antibiotics. These results suggest that combination of the identified novel depolymerase Dep42, encoded by the phage SH-KP152226, with antibiotics may represent a promising strategy to combat infections caused by drug-resistant and biofilm-forming K. pneumoniae.

7.
BMC Bioinformatics ; 19(Suppl 20): 505, 2018 Dec 21.
Article in English | MEDLINE | ID: mdl-30577738

ABSTRACT

BACKGROUND: The traditional methods of visualizing high-dimensional data objects in low-dimensional metric spaces are subject to the basic limitations of metric space. These limitations result in multidimensional scaling that fails to faithfully represent non-metric similarity data. RESULTS: Multiple maps t-SNE (mm-tSNE) has drawn much attention due to the construction of multiple mappings in low-dimensional space to visualize the non-metric pairwise similarity to eliminate the limitations of a single metric map. mm-tSNE regularization combines the intrinsic geometry between data points in a high-dimensional space. The weight of data points on each map is used as the regularization parameter of the manifold, so the weights of similar data points on the same map are also as close as possible. However, these methods use standard momentum methods to calculate parameters of gradient at each iteration, which may lead to erroneous gradient search directions so that the target loss function fails to achieve a better local minimum. In this article, we use a Nesterov momentum method to learn the target loss function and correct each gradient update by looking back at the previous gradient in the candidate search direction. By using indirect second-order information, the algorithm obtains faster convergence than the original algorithm. To further evaluate our approach from a comparative perspective, we conducted experiments on several datasets including social network data, phenotype similarity data, and microbiomic data. CONCLUSIONS: The experimental results show that the proposed method achieves better results than several versions of mm-tSNE based on three evaluation indicators including the neighborhood preservation ratio (NPR), error rate and time complexity.


Subject(s)
Gene Expression Regulation , Genetic Diseases, Inborn/genetics , Microbiota/genetics , Nonlinear Dynamics , Algorithms , Databases, Genetic , Humans , Phenotype , Time Factors
8.
Pancreas ; 40(7): 1103-6, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21926546

ABSTRACT

OBJECTIVES: The aim of this present study was to investigate the d-dimer in acute pancreatitis and its associations with triglyceride (TG). METHODS: The d-dimer was measured in 45 patients with mild acute pancreatitis, 43 patients with severe acute pancreatitis, and 45 healthy controls. Eighty-eight patients were divided into high and low TG groups based on their TG levels. Twenty outpatients with serumal TG levels higher than 5.65 mM were chosen as hypertriglyceridemia controls. We investigated whether there were any correlations between the d-dimer levels and serumal TG in acute pancreatitis. RESULTS: In 45 patients with mild acute pancreatitis, the d-dimer increased to approximately 2 times over the reference value, whereas in 43 patients with severe acute pancreatitis, the d-dimer level increased to 6 times above the limit; the difference was significant. Both TG and acute pancreatitis could cause an elevation of the d-dimer level, in which TG takes a more important role. The increase in the d-dimer was also directly related to the severity of acute pancreatitis. CONCLUSIONS: Plasma concentrations of the d-dimer increase in acute pancreatitis. The increase in TG is probably the main cause of the d-dimer elevation in patients with acute pancreatitis.


Subject(s)
Fibrin Fibrinogen Degradation Products/metabolism , Hypertriglyceridemia/blood , Pancreatitis/blood , Triglycerides/blood , Acute Disease , Adult , Aged , Aged, 80 and over , Analysis of Variance , Biomarkers/blood , Case-Control Studies , China , Female , Humans , Hypertriglyceridemia/diagnosis , Male , Middle Aged , Pancreatitis/diagnosis , Prognosis , Prospective Studies , Severity of Illness Index , Time Factors , Up-Regulation
9.
Zhonghua Nei Ke Za Zhi ; 46(12): 1011-3, 2007 Dec.
Article in Chinese | MEDLINE | ID: mdl-18478919

ABSTRACT

OBJECTIVE: To investigate the role of D-dimer in human acute pancreatitis (AP) and its relation to the severity of the disease. METHODS: Plasma concentration of D-dimer was measured in 31 patients with mild AP (MAP), 30 patients with severe AP (SAP) and 30 normal people as a control group. The results of routine laboratory tests, 48-hour Ranson and 24-hour APACHE II scores were all recorded. We attempted to find a relationship between D-dimer level and the results of routine laboratory tests, 48-hour Ranson scores and 24-hour APACHE II scores. RESULTS: (1) As compared with the control group, the plasma concentration of D-dimer was much higher in MAP (0.21 +/- 0.21) mg/L (P = 0.029) and SAP patients (0.69 +/- 0.32) mg/L (P = 0.000). The D-dimer level in the SAP group was higher than that in the MAP group (P = 0.000). (2) The rise in the D-dimer level was directly related to 48-hour Ranson (P = 0.000) and 24-hour APACHE II scores (P = 0.000). (3) The rise in the D-dimer level was directly related to leukocyte count, blood glucose, creatinine, prothrombin time and partial thromboplastin time (P < 0.05) and inversely related to hematocrit, albumin and calcium (P < 0.05). CONCLUSIONS: Plasma concentration of the D-dimer rises in AP patients; D-dimer level is related to the disease severity.


Subject(s)
Fibrin Fibrinogen Degradation Products/metabolism , Pancreatitis, Acute Necrotizing/blood , Pancreatitis/blood , APACHE , Acute Disease , Adult , Aged , Female , Humans , Male , Middle Aged , Pancreatitis/pathology , Pancreatitis, Acute Necrotizing/pathology , Severity of Illness Index
SELECTION OF CITATIONS
SEARCH DETAIL
...