Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Int J Mol Sci ; 25(10)2024 May 18.
Article in English | MEDLINE | ID: mdl-38791544

ABSTRACT

Antimicrobial peptides (AMPs) are promising candidates for new antibiotics due to their broad-spectrum activity against pathogens and reduced susceptibility to resistance development. Deep-learning techniques, such as deep generative models, offer a promising avenue to expedite the discovery and optimization of AMPs. A remarkable example is the Feedback Generative Adversarial Network (FBGAN), a deep generative model that incorporates a classifier during its training phase. Our study aims to explore the impact of enhanced classifiers on the generative capabilities of FBGAN. To this end, we introduce two alternative classifiers for the FBGAN framework, both surpassing the accuracy of the original classifier. The first classifier utilizes the k-mers technique, while the second applies transfer learning from the large protein language model Evolutionary Scale Modeling 2 (ESM2). Integrating these classifiers into FBGAN not only yields notable performance enhancements compared to the original FBGAN but also enables the proposed generative models to achieve comparable or even superior performance to established methods such as AMPGAN and HydrAMP. This achievement underscores the effectiveness of leveraging advanced classifiers within the FBGAN framework, enhancing its computational robustness for AMP de novo design and making it comparable to existing literature.


Subject(s)
Antimicrobial Peptides , Antimicrobial Peptides/chemistry , Antimicrobial Peptides/pharmacology , Drug Design/methods , Neural Networks, Computer , Deep Learning , Algorithms
2.
Mach Learn ; 112(11): 4257-4287, 2023.
Article in English | MEDLINE | ID: mdl-37900054

ABSTRACT

Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (i.e., high dimensional data). However, lower-dimensional representations that retain the useful biological information do exist. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways (genesets in general) and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL's latent space has a fairly straightforward biological interpretation. PASL is shown to outperform in predictive performance the state-of-the-art method (PLIER) on two collections of breast cancer and leukemia gene expression datasets. PASL is also trained on a large corpus of 50000 gene expression samples to construct a universal dictionary of features across different tissues and pathologies. The dictionary validated on 35643 held-out samples for reconstruction error. It is then applied on 165 held-out datasets spanning a diverse range of diseases. The AutoML tool JADBio is employed to show that the predictive information in the PASL-created feature space is retained after the transformation. The code is available at https://github.com/mensxmachina/PASL.

3.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9439-9450, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35385390

ABSTRACT

In this article, we propose a novel loss function for training generative adversarial networks (GANs) aiming toward deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions (CGFs) giving rise to Cumulant GAN. Relying on a recently derived variational formula, we show that the corresponding optimization problem is equivalent to Rényi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the Rényi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance, and χ2 -divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions, and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score (IS) and Fréchet inception distance (FID) when both weaker and stronger discriminators are considered.

4.
Bioinformatics ; 35(18): 3387-3396, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30715136

ABSTRACT

MOTIVATION: Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously. RESULTS: In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL's accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway. AVAILABILITY AND IMPLEMENTATION: Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Nonlinear Dynamics , Signal Transduction , Software
5.
BMC Infect Dis ; 18(1): 245, 2018 05 30.
Article in English | MEDLINE | ID: mdl-29843621

ABSTRACT

BACKGROUND: Emerging pathogens such as Zika, chikungunya, Ebola, and dengue viruses are serious threats to national and global health security. Accurate forecasts of emerging epidemics and their severity are critical to minimizing subsequent mortality, morbidity, and economic loss. The recent introduction of chikungunya and Zika virus to the Americas underscores the need for better methods for disease surveillance and forecasting. METHODS: To explore the suitability of current approaches to forecasting emerging diseases, the Defense Advanced Research Projects Agency (DARPA) launched the 2014-2015 DARPA Chikungunya Challenge to forecast the number of cases and spread of chikungunya disease in the Americas. Challenge participants (n=38 during final evaluation) provided predictions of chikungunya epidemics across the Americas for a six-month period, from September 1, 2014 to February 16, 2015, to be evaluated by comparison with incidence data reported to the Pan American Health Organization (PAHO). This manuscript presents an overview of the challenge and a summary of the approaches used by the winners. RESULTS: Participant submissions were evaluated by a team of non-competing government subject matter experts based on numerical accuracy and methodology. Although this manuscript does not include in-depth analyses of the results, cursory analyses suggest that simpler models appear to outperform more complex approaches that included, for example, demographic information and transportation dynamics, due to the reporting biases, which can be implicitly captured in statistical models. Mosquito-dynamics, population specific information, and dengue-specific information correlated best with prediction accuracy. CONCLUSION: We conclude that with careful consideration and understanding of the relative advantages and disadvantages of particular methods, implementation of an effective prediction system is feasible. However, there is a need to improve the quality of the data in order to more accurately predict the course of epidemics.


Subject(s)
Chikungunya Fever/epidemiology , Chikungunya Fever/prevention & control , Disease Outbreaks/prevention & control , Infection Control/organization & administration , Infection Control/trends , Security Measures/organization & administration , United States Department of Defense/organization & administration , Demography , Dengue/epidemiology , Dengue/prevention & control , Forecasting/methods , Humans , Infection Control/standards , Organizational Innovation , Research Design , Security Measures/standards , Security Measures/trends , United States/epidemiology , United States Department of Defense/trends , Zika Virus Infection/epidemiology , Zika Virus Infection/prevention & control
6.
PLoS One ; 10(7): e0130825, 2015.
Article in English | MEDLINE | ID: mdl-26161544

ABSTRACT

Existing sensitivity analysis approaches are not able to handle efficiently stochastic reaction networks with a large number of parameters and species, which are typical in the modeling and simulation of complex biochemical phenomena. In this paper, a two-step strategy for parametric sensitivity analysis for such systems is proposed, exploiting advantages and synergies between two recently proposed sensitivity analysis methodologies for stochastic dynamics. The first method performs sensitivity analysis of the stochastic dynamics by means of the Fisher Information Matrix on the underlying distribution of the trajectories; the second method is a reduced-variance, finite-difference, gradient-type sensitivity approach relying on stochastic coupling techniques for variance reduction. Here we demonstrate that these two methods can be combined and deployed together by means of a new sensitivity bound which incorporates the variance of the quantity of interest as well as the Fisher Information Matrix estimated from the first method. The first step of the proposed strategy labels sensitivities using the bound and screens out the insensitive parameters in a controlled manner. In the second step of the proposed strategy, a finite-difference method is applied only for the sensitivity estimation of the (potentially) sensitive parameters that have not been screened out in the first step. Results on an epidermal growth factor network with fifty parameters and on a protein homeostasis with eighty parameters demonstrate that the proposed strategy is able to quickly discover and discard the insensitive parameters and in the remaining potentially sensitive parameters it accurately estimates the sensitivities. The new sensitivity strategy can be several times faster than current state-of-the-art approaches that test all parameters, especially in "sloppy" systems. In particular, the computational acceleration is quantified by the ratio between the total number of parameters over the number of the sensitive parameters.


Subject(s)
Algorithms , Biostatistics/methods , Models, Biological , Stochastic Processes , ErbB Receptors/metabolism , Feedback, Physiological , Heat-Shock Proteins/metabolism , Homeostasis , Humans , Mathematical Computing , Reproducibility of Results , Tumor Suppressor Protein p53/metabolism
7.
J Chem Phys ; 143(1): 014116, 2015 Jul 07.
Article in English | MEDLINE | ID: mdl-26156474

ABSTRACT

In this paper, we present a parametric sensitivity analysis (SA) methodology for continuous time and continuous space Markov processes represented by stochastic differential equations. Particularly, we focus on stochastic molecular dynamics as described by the Langevin equation. The utilized SA method is based on the computation of the information-theoretic (and thermodynamic) quantity of relative entropy rate (RER) and the associated Fisher information matrix (FIM) between path distributions, and it is an extension of the work proposed by Y. Pantazis and M. A. Katsoulakis [J. Chem. Phys. 138, 054115 (2013)]. A major advantage of the pathwise SA method is that both RER and pathwise FIM depend only on averages of the force field; therefore, they are tractable and computable as ergodic averages from a single run of the molecular dynamics simulation both in equilibrium and in non-equilibrium steady state regimes. We validate the performance of the extended SA method to two different molecular stochastic systems, a standard Lennard-Jones fluid and an all-atom methane liquid, and compare the obtained parameter sensitivities with parameter sensitivities on three popular and well-studied observable functions, namely, the radial distribution function, the mean squared displacement, and the pressure. Results show that the RER-based sensitivities are highly correlated with the observable-based sensitivities.

8.
BMC Bioinformatics ; 14: 311, 2013 Oct 22.
Article in English | MEDLINE | ID: mdl-24148216

ABSTRACT

BACKGROUND: Stochastic modeling and simulation provide powerful predictive methods for the intrinsic understanding of fundamental mechanisms in complex biochemical networks. Typically, such mathematical models involve networks of coupled jump stochastic processes with a large number of parameters that need to be suitably calibrated against experimental data. In this direction, the parameter sensitivity analysis of reaction networks is an essential mathematical and computational tool, yielding information regarding the robustness and the identifiability of model parameters. However, existing sensitivity analysis approaches such as variants of the finite difference method can have an overwhelming computational cost in models with a high-dimensional parameter space. RESULTS: We develop a sensitivity analysis methodology suitable for complex stochastic reaction networks with a large number of parameters. The proposed approach is based on Information Theory methods and relies on the quantification of information loss due to parameter perturbations between time-series distributions. For this reason, we need to work on path-space, i.e., the set consisting of all stochastic trajectories, hence the proposed approach is referred to as "pathwise". The pathwise sensitivity analysis method is realized by employing the rigorously-derived Relative Entropy Rate, which is directly computable from the propensity functions. A key aspect of the method is that an associated pathwise Fisher Information Matrix (FIM) is defined, which in turn constitutes a gradient-free approach to quantifying parameter sensitivities. The structure of the FIM turns out to be block-diagonal, revealing hidden parameter dependencies and sensitivities in reaction networks. CONCLUSIONS: As a gradient-free method, the proposed sensitivity analysis provides a significant advantage when dealing with complex stochastic systems with a large number of parameters. In addition, the knowledge of the structure of the FIM can allow to efficiently address questions on parameter identifiability, estimation and robustness. The proposed method is tested and validated on three biochemical systems, namely: (a) a protein production/degradation model where explicit solutions are available, permitting a careful assessment of the method, (b) the p53 reaction network where quasi-steady stochastic oscillations of the concentrations are observed, and for which continuum approximations (e.g. mean field, stochastic Langevin, etc.) break down due to persistent oscillations between high and low populations, and (c) an Epidermal Growth Factor Receptor model which is an example of a high-dimensional stochastic reaction network with more than 200 reactions and a corresponding number of parameters.


Subject(s)
Information Theory , Models, Biological , Systems Biology/methods , Entropy , Proteins/genetics , Proteins/physiology , Stochastic Processes
9.
J Chem Phys ; 138(5): 054115, 2013 Feb 07.
Article in English | MEDLINE | ID: mdl-23406106

ABSTRACT

We propose a new sensitivity analysis methodology for complex stochastic dynamics based on the relative entropy rate. The method becomes computationally feasible at the stationary regime of the process and involves the calculation of suitable observables in path space for the relative entropy rate and the corresponding Fisher information matrix. The stationary regime is crucial for stochastic dynamics and here allows us to address the sensitivity analysis of complex systems, including examples of processes with complex landscapes that exhibit metastability, non-reversible systems from a statistical mechanics perspective, and high-dimensional, spatially distributed models. All these systems exhibit, typically non-Gaussian stationary probability distributions, while in the case of high-dimensionality, histograms are impossible to construct directly. Our proposed methods bypass these challenges relying on the direct Monte Carlo simulation of rigorously derived observables for the relative entropy rate and Fisher information in path space rather than on the stationary probability distribution itself. We demonstrate the capabilities of the proposed methodology by focusing here on two classes of problems: (a) Langevin particle systems with either reversible (gradient) or non-reversible (non-gradient) forcing, highlighting the ability of the method to carry out sensitivity analysis in non-equilibrium systems; and, (b) spatially extended kinetic Monte Carlo models, showing that the method can handle high-dimensional problems.


Subject(s)
Entropy , Molecular Dynamics Simulation , Kinetics , Monte Carlo Method , Stochastic Processes
SELECTION OF CITATIONS
SEARCH DETAIL
...