Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-37310820

ABSTRACT

Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this article, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.

2.
Article in English | MEDLINE | ID: mdl-37030784

ABSTRACT

Identifying the Hamiltonian of an unknown quantum system is a critical task in the area of quantum information. In this article, we propose a systematic Hamiltonian identification approach via quantum ensemble multiclass classification (HI-QEMC). This approach is implemented by a three-step iterative refining process, i.e., parameter interval guess, verification, and judgment. In the parameter interval guess step, the parameter interval is divided into several sub-intervals and the true Hamiltonian parameter is guessed in one of them. In the parameter interval verification step, cross verification is applied to verify the accuracy of the guess. In the parameter interval judgment step, an adaptive interval judgment (AIJ) algorithm is designed to determine the sub-interval containing the true Hamiltonian parameter. Numerical results on two typical quantum systems, i.e., two-level quantum systems and three-level quantum systems, demonstrate the effectiveness and superior performance of the proposed approach for quantum Hamiltonian identification.

3.
Phys Rev Lett ; 130(4): 043604, 2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36763416

ABSTRACT

We present a combined analytical and numerical study for coherent terahertz control of a single molecular polariton, formed by strongly coupling two rotational states of a molecule with a single-mode cavity. Compared to the bare molecules driven by a single terahertz pulse, the presence of a cavity strongly modifies the postpulse orientation of the polariton, making it difficult to obtain its maximal degree of orientation. To solve this challenging problem toward achieving complete quantum coherent control, we derive an analytical solution of a pulse-driven quantum Jaynes-Cummings model by expanding the wave function into entangled states and constructing an effective Hamiltonian. We utilize it to design a composite terahertz pulse and obtain the maximum degree of orientation of the polariton by exploiting photon blockade effects. This Letter offers a new strategy to study rotational dynamics in the strong-coupling regime and provides a method for complete quantum coherent control of a single molecular polariton. It, therefore, has direct applications in polariton chemistry and molecular polaritonics for exploring novel quantum optical phenomena.

4.
IEEE Trans Cybern ; 53(6): 3467-3478, 2023 Jun.
Article in English | MEDLINE | ID: mdl-34910651

ABSTRACT

Quantum language models (QLMs) in which words are modeled as a quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature, word sequences are basically modeled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A quantum-inspired neural network (NN) module is yet to be developed to explicitly capture the nonclassical correlations within the word sequences. We propose a NN model with a novel entanglement embedding (EE) module, whose function is to transform the word sequence into an entangled pure state representation. Strong quantum entanglement, which is the central concept of quantum information and an indication of parallelized correlations among the words, is observed within the word sequences. The proposed QLM with EE (QLM-EE) is proposed to implement on classical computing devices with a quantum-inspired NN structure, and numerical experiments show that QLM-EE achieves superior performance compared with the classical deep NN models and other QLMs on question answering (QA) datasets. In addition, the post-hoc interpretability of the model can be improved by quantifying the degree of entanglement among the word states.

5.
IEEE Trans Neural Netw Learn Syst ; 34(12): 9742-9756, 2023 Dec.
Article in English | MEDLINE | ID: mdl-35349452

ABSTRACT

Evolution strategies (ESs), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient and are much faster when many central processing units (CPUs) are available due to better parallelization. In this article, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move toward new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, instance weighted incremental evolution strategies (IW-IESs), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This article thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

6.
IEEE Trans Cybern ; 53(12): 7509-7520, 2023 Dec.
Article in English | MEDLINE | ID: mdl-35580095

ABSTRACT

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In this article, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the nonstationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian nonparametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture; thus, the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.

7.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8852-8865, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35263262

ABSTRACT

Deep reinforcement learning (DRL) has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel DRL approach by constructing a curriculum consisting of a set of intermediate tasks defined by fidelity thresholds, where the tasks among a curriculum can be statically determined before the learning process or dynamically generated during the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based DRL (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical comparison with the traditional methods [gradient method (GD), genetic algorithm (GA), and several other DRL methods] demonstrates that CDRL exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with few control pulses.

8.
Front Physiol ; 13: 953702, 2022.
Article in English | MEDLINE | ID: mdl-36091404

ABSTRACT

A fast prediction of blood flow in stenosed arteries with a hybrid framework of machine learning and immersed boundary-lattice Boltzmann method (IB-LBM) is presented. The integrated framework incorporates the immersed boundary method for its excellent capability in handling complex boundaries, the multi-relaxation-time LBM for its efficient modelling for unsteady flows and the deep neural network (DNN) for its high efficiency in artificial learning. Specifically, the stenosed artery is modelled by a channel for two-dimensional (2D) cases or a tube for three-dimensional (3D) cases with a stenosis approximated by a fifth-order polynomial. An IB-LBM is adopted to obtain the training data for the DNN which is constructed to generate an approximate model for the fast flow prediction. In the DNN, the inputs are the characteristic parameters of the stenosis and fluid node coordinates, and the outputs are the mean velocity and pressure at each node. To characterise complex stenosis, a convolutional neural network (CNN) is built to extract the stenosis properties by using the data generated by the aforementioned polynomial. Both 2D and 3D cases (including 3D asymmetrical case) are constructed and examined to demonstrate the effectiveness of the proposed method. Once the DNN model is trained, the prediction efficiency of blood flow in stenosed arteries is much higher compared with the direct computational fluid dynamics simulations. The proposed method has a potential for applications in clinical diagnosis and treatment where the real-time modelling results are desired.

9.
Article in English | MEDLINE | ID: mdl-37015645

ABSTRACT

Multi-Agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this article, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).

10.
IEEE Trans Cybern ; 52(2): 1073-1085, 2022 Feb.
Article in English | MEDLINE | ID: mdl-32386176

ABSTRACT

A hybrid quantum-classical filtering problem, where a qubit system is disturbed by a classical stochastic process, is investigated. The strategy is to model the classical disturbance by using an optical cavity. The relations between classical disturbances and the cavity analog system are analyzed. The dynamics of the enlarged quantum network system, which includes a qubit system and a cavity system, are derived. A stochastic master equation for the qubit-cavity hybrid system is given, based on which estimates for the state of the cavity system and the classical signal are obtained. The quantum-extended Kalman filter is employed to achieve efficient computation. The numerical results are presented to illustrate the effectiveness of our methods.

11.
IEEE Trans Neural Netw Learn Syst ; 33(8): 4003-4016, 2022 08.
Article in English | MEDLINE | ID: mdl-33571098

ABSTRACT

A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios. In this article, we propose lifelong incremental reinforcement learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments. We develop and maintain a library that contains an infinite mixture of parameterized environment models, which is equivalent to clustering environment parameters in a latent space. The prior distribution over the mixture is formulated as a Chinese restaurant process (CRP), which incrementally instantiates new environment models without any external information to signal environmental changes in advance. During lifelong learning, we employ the expectation-maximization (EM) algorithm with online Bayesian inference to update the mixture in a fully incremental manner. In EM, the E-step involves estimating the posterior expectation of environment-to-cluster assignments, whereas the M-step updates the environment parameters for future learning. This method allows for all environment models to be adapted as necessary, with new models instantiated for environmental changes and old models retrieved when previously seen environments are encountered again. Simulation experiments demonstrate that LLIRL outperforms relevant existing methods and enables effective incremental adaptation to various dynamic environments for lifelong learning.


Subject(s)
Algorithms , Neural Networks, Computer , Bayes Theorem , Education, Continuing , Reinforcement, Psychology
12.
IEEE Trans Cybern ; 52(9): 9326-9338, 2022 Sep.
Article in English | MEDLINE | ID: mdl-33600343

ABSTRACT

In this article, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to the traditional experience replay mechanism in DRL, the proposed DRL with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations and then the preparation operation and depreciation operation are performed on the transitions. In this process, the preparation operation reflects the relationship between the temporal-difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms, such as DRL-PER and DCRL on most of these games with improved training efficiency and is also applicable to such memory-based DRL approaches as double network and dueling network.


Subject(s)
Algorithms , Reinforcement, Psychology
13.
IEEE Trans Cybern ; 51(2): 889-899, 2021 Feb.
Article in English | MEDLINE | ID: mdl-30843816

ABSTRACT

This paper solves the problem of discrete-time fault-tolerant quantum filtering for a class of laser-atom open quantum systems subject to the stochastic faults. We show that by using the discrete-time quantum measurements, optimal estimates of both the atomic observables and the classical fault process can be simultaneously determined in terms of recursive quantum stochastic difference equations. A dispersive interaction quantum system example is used to demonstrate the proposed filtering approach.

14.
Opt Lett ; 45(4): 960-963, 2020 Feb 15.
Article in English | MEDLINE | ID: mdl-32058517

ABSTRACT

Controlling coherence and interference of quantum states is one of the central goals in quantum science. Different from energetically discrete quantum states, however, it remains a demanding task to visualize coherent properties of degenerate states (e.g., magnetic sublevels). It becomes further inaccessible in the absence of an external perturbation (e.g., Zeeman effect). Here, we present a theoretical analysis of all-optical control of degenerate magnetic states in the molecular hydrogen ion, $ {\rm H}_2^ + $H2+, by using two time-delayed co- and counterrotating circularly polarized attosecond extreme-ultraviolet (XUV) pulses. We perform accurate simulations to examine this model by solving the three-dimensional time-dependent Schrödinger equation. A counterintuitive phenomenon of quantum interference between degenerate magnetic sublevels appears in the time-dependent electronic probability density, which is observable by using x-ray-induced transient angular and energy-resolved photoelectron spectra. This work provides an insight into quantum interference of electron dynamics inside molecules at the quantum degeneracy level.

15.
Nat Hum Behav ; 4(3): 294-307, 2020 03.
Article in English | MEDLINE | ID: mdl-31959921

ABSTRACT

Classical reinforcement learning (CRL) has been widely applied in neuroscience and psychology; however, quantum reinforcement learning (QRL), which shows superior performance in computer simulations, has never been empirically tested on human decision-making. Moreover, all current successful quantum models for human cognition lack connections to neuroscience. Here we studied whether QRL can properly explain value-based decision-making. We compared 2 QRL and 12 CRL models by using behavioural and functional magnetic resonance imaging data from healthy and cigarette-smoking subjects performing the Iowa Gambling Task. In all groups, the QRL models performed well when compared with the best CRL models and further revealed the representation of quantum-like internal-state-related variables in the medial frontal gyrus in both healthy subjects and smokers, suggesting that value-based decision-making can be illustrated by QRL at both the behavioural and neural levels.


Subject(s)
Brain Mapping , Cigarette Smoking/physiopathology , Decision Making/physiology , Executive Function/physiology , Models, Theoretical , Prefrontal Cortex/physiology , Reinforcement, Psychology , Adult , Humans , Magnetic Resonance Imaging , Prefrontal Cortex/diagnostic imaging , Prefrontal Cortex/physiopathology , Quantum Theory
16.
IEEE Trans Cybern ; 50(8): 3581-3593, 2020 Aug.
Article in English | MEDLINE | ID: mdl-31295133

ABSTRACT

Robust control design for quantum systems has been recognized as a key task in quantum information technology, molecular chemistry, and atomic physics. In this paper, an improved differential evolution algorithm, referred to as multiple-samples and mixed-strategy DE (msMS_DE), is proposed to search robust fields for various quantum control problems. In msMS_DE, multiple samples are used for fitness evaluation and a mixed strategy is employed for the mutation operation. In particular, the msMS_DE algorithm is applied to the control problems of: 1) open inhomogeneous quantum ensembles and 2) the consensus goal of a quantum network with uncertainties. Numerical results are presented to demonstrate the excellent performance of the improved machine learning algorithm for these two classes of quantum robust control problems. Furthermore, msMS_DE is experimentally implemented on femtosecond (fs) laser control applications to optimize two-photon absorption and control fragmentation of the molecule CH2BrI. The experimental results demonstrate the excellent performance of msMS_DE in searching for effective fs laser pulses for various tasks.

17.
Phys Rev Lett ; 123(22): 223202, 2019 Nov 29.
Article in English | MEDLINE | ID: mdl-31868398

ABSTRACT

The possibility to manipulate quantum coherence and interference, apart from its fundamental interest in quantum mechanics, is essential for controlling nonlinear optical processes such as high harmonic generation, multiphoton absorption, and stimulated Raman scattering. We show, analytically and numerically, how a nonlinear optical process via resonance Raman scattering (RRS) can be manipulated in a four-level double-Λ system by using pulsed laser fields. We find that two simultaneously excited RRS paths involved in the system can generate an ultimately destructive interference in the broad-bandwidth-limit regime. This, in turn, reduces the four-level system to an equivalent three-level system in a V configuration capable of naturally vanishing RRS effects. We further show that this counterintuitive phenomenon, i.e., the RRS vanishing, can be prevented by transferring a modulated phase of the laser pulse to the system at resonance frequencies. This work demonstrates a clear signature of both quantum destructive and constructive interference by actively controlling resonant multiphoton processes in multilevel quantum systems, and it therefore has potential applications in nonlinear optics, quantum control, and quantum information science.

18.
Opt Express ; 27(23): 34416-34433, 2019 Nov 11.
Article in English | MEDLINE | ID: mdl-31878489

ABSTRACT

Entangled measurement is a crucial tool in quantum technology. We propose a new entanglement measure of multi-mode detection, which estimates the amount of entanglement that can be created in a measurement. To illustrate the proposed measure, we perform quantum tomography of a two-mode detector that is comprised of two superconducting nanowire single photon detectors. Our method utilizes coherent states as probe states, which can be easily prepared with accuracy. Our work shows that a separable state such as a coherent state is enough to characterize a potentially entangled detector. We investigate the entangling capability of the detector in various settings. Our proposed measure verifies that the detector makes an entangled measurement under certain conditions, and reveals the nature of the entangling properties of the detector. Since the precise characterization of a detector is essential for applications in quantum information technology, the experimental reconstruction of detector properties along with the proposed measure will be key features in future quantum information processing.

19.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2216-2226, 2018 06.
Article in English | MEDLINE | ID: mdl-29771673

ABSTRACT

In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.

20.
Phys Chem Chem Phys ; 20(14): 9498-9506, 2018 Apr 04.
Article in English | MEDLINE | ID: mdl-29569663

ABSTRACT

Achieving fast and efficient quantum state transfer is a fundamental task in physics, chemistry and quantum information science. However, the successful implementation of the perfect quantum state transfer also requires robustness under practically inevitable perturbative defects. Here, we demonstrate how an optimal and robust quantum state transfer can be achieved by shaping the spectral phase of an ultrafast laser pulse in the framework of frequency domain quantum optimal control theory. Our numerical simulations of the single dibenzoterrylene molecule as well as in atomic rubidium show that optimal and robust quantum state transfer via spectral phase modulated laser pulses can be achieved by incorporating a filtering function of the frequency into the optimization algorithm, which in turn has potential applications for ultrafast robust control of photochemical reactions.

SELECTION OF CITATIONS
SEARCH DETAIL
...