Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
Add more filters










Publication year range
1.
PLoS Comput Biol ; 19(11): e1011655, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38011273

ABSTRACT

Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability.


Subject(s)
Amino Acids , Proteins , Proteins/chemistry , Amino Acid Sequence
2.
NPJ Syst Biol Appl ; 9(1): 26, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37339950

ABSTRACT

Dimensionality reduction offers unique insights into high-dimensional microbiome dynamics by leveraging collective abundance fluctuations of multiple bacteria driven by similar ecological perturbations. However, methods providing lower-dimensional representations of microbiome dynamics both at the community and individual taxa levels are not currently available. To that end, we present EMBED: Essential MicroBiomE Dynamics, a probabilistic nonlinear tensor factorization approach. Like normal mode analysis in structural biophysics, EMBED infers ecological normal modes (ECNs), which represent the unique orthogonal modes capturing the collective behavior of microbial communities. Using multiple real and synthetic datasets, we show that a very small number of ECNs can accurately approximate microbiome dynamics. Inferred ECNs reflect specific ecological behaviors, providing natural templates along which the dynamics of individual bacteria may be partitioned. Moreover, the multi-subject treatment in EMBED systematically identifies subject-specific and universal abundance dynamics that are not detected by traditional approaches. Collectively, these results highlight the utility of EMBED as a versatile dimensionality reduction tool for studies of microbiome dynamics.


Subject(s)
Microbiota , Microbiota/genetics , Bacteria/genetics
3.
Nat Metab ; 4(6): 711-723, 2022 06.
Article in English | MEDLINE | ID: mdl-35739397

ABSTRACT

Production of oxidized biomass, which requires regeneration of the cofactor NAD+, can be a proliferation bottleneck that is influenced by environmental conditions. However, a comprehensive quantitative understanding of metabolic processes that may be affected by NAD+ deficiency is currently missing. Here, we show that de novo lipid biosynthesis can impose a substantial NAD+ consumption cost in proliferating cancer cells. When electron acceptors are limited, environmental lipids become crucial for proliferation because NAD+ is required to generate precursors for fatty acid biosynthesis. We find that both oxidative and even net reductive pathways for lipogenic citrate synthesis are gated by reactions that depend on NAD+ availability. We also show that access to acetate can relieve lipid auxotrophy by bypassing the NAD+ consuming reactions. Gene expression analysis demonstrates that lipid biosynthesis strongly anti-correlates with expression of hypoxia markers across tumor types. Overall, our results define a requirement for oxidative metabolism to support biosynthetic reactions and provide a mechanistic explanation for cancer cell dependence on lipid uptake in electron acceptor-limited conditions, such as hypoxia.


Subject(s)
NAD , Neoplasms , Cell Proliferation , Electrons , Humans , Hypoxia , Lipids , NAD/metabolism
4.
PLoS Comput Biol ; 17(8): e1009275, 2021 08.
Article in English | MEDLINE | ID: mdl-34358223

ABSTRACT

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers' identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same 'bath' whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales.


Subject(s)
Computational Biology/methods , Models, Statistical , Algorithms , Entropy
5.
Nat Microbiol ; 5(5): 768-775, 2020 05.
Article in English | MEDLINE | ID: mdl-32284567

ABSTRACT

The gut microbiota is now widely recognized as a dynamic ecosystem that plays an important role in health and disease. Although current sequencing technologies make it possible to explore how relative abundances of host-associated bacteria change over time, the biological processes governing microbial dynamics remain poorly understood. Therefore, as in other ecological systems, it is important to identify quantitative relationships describing various aspects of gut microbiota dynamics. In the present study, we use multiple high-resolution time series data obtained from humans and mice to demonstrate that, despite their inherent complexity, gut microbiota dynamics can be characterized by several robust scaling relationships. Interestingly, the observed patterns are highly similar to those previously identified across diverse ecological communities and economic systems, including the temporal fluctuations of animal and plant populations and the performance of publicly traded companies. Specifically, we find power-law relationships describing short- and long-term changes in gut microbiota abundances, species residence and return times, and the correlation between the mean and the temporal variance of species abundances. The observed scaling laws are altered in mice receiving different diets and are affected by context-specific perturbations in humans. We use the macroecological relationships to reveal specific bacterial taxa, the dynamics of which are substantially perturbed by dietary and environmental changes. Overall, our results suggest that a quantitative macroecological framework will be important for characterizing and understanding the complex dynamics of diverse microbial communities.


Subject(s)
Bacteria/classification , Gastrointestinal Microbiome/physiology , Gastrointestinal Tract/microbiology , Animals , Bacteria/genetics , Biodiversity , Computer Simulation , Diet , Gastrointestinal Microbiome/genetics , Humans , Mice , Microbiota , Models, Theoretical , RNA, Ribosomal, 16S
6.
Annu Rev Phys Chem ; 71: 213-238, 2020 04 20.
Article in English | MEDLINE | ID: mdl-32075515

ABSTRACT

Ever since Clausius in 1865 and Boltzmann in 1877, the concepts of entropy and of its maximization have been the foundations for predicting how material equilibria derive from microscopic properties. But, despite much work, there has been no equally satisfactory general variational principle for nonequilibrium situations. However, in 1980, a new avenue was opened by E.T. Jaynes and by Shore and Johnson. We review here maximum caliber, which is a maximum-entropy-like principle that can infer distributions of flows over pathways, given dynamical constraints. This approach is providing new insights, particularly into few-particle complex systems, such as gene circuits, protein conformational reaction coordinates, network traffic, bird flocking, cell motility, and neuronal firing.


Subject(s)
DNA/chemistry , Gene Regulatory Networks , Models, Theoretical , Proteins/chemistry , DNA/genetics , Entropy , Kinetics , Models, Chemical , Models, Genetic , Molecular Dynamics Simulation , Nucleic Acid Conformation , Protein Conformation , Proteins/genetics
7.
Elife ; 92020 01 21.
Article in English | MEDLINE | ID: mdl-31961323

ABSTRACT

Detecting relative rather than absolute changes in extracellular signals enables cells to make decisions in constantly fluctuating environments. It is currently not well understood how mammalian signaling networks store the memories of past stimuli and subsequently use them to compute relative signals, that is perform fold change detection. Using the growth factor-activated PI3K-Akt signaling pathway, we develop here computational and analytical models, and experimentally validate a novel non-transcriptional mechanism of relative sensing in mammalian cells. This mechanism relies on a new form of cellular memory, where cells effectively encode past stimulation levels in the abundance of cognate receptors on the cell surface. The surface receptor abundance is regulated by background signal-dependent receptor endocytosis and down-regulation. We show the robustness and specificity of relative sensing for two physiologically important ligands, epidermal growth factor (EGF) and hepatocyte growth factor (HGF), and across wide ranges of background stimuli. Our results suggest that similar mechanisms of cell memory and fold change detection may be important in diverse signaling cascades and multiple biological contexts.


Subject(s)
Cell Physiological Phenomena/physiology , Extracellular Space/metabolism , Receptors, Cell Surface/metabolism , Signal Transduction/physiology , Cell Line , Cell Membrane/metabolism , Class I Phosphatidylinositol 3-Kinases/metabolism , Endocytosis/physiology , Epidermal Growth Factor/metabolism , Hepatocyte Growth Factor/metabolism , Humans , Models, Biological , Proto-Oncogene Proteins c-akt/metabolism
8.
Cell Syst ; 10(2): 204-212.e8, 2020 02 26.
Article in English | MEDLINE | ID: mdl-31864963

ABSTRACT

Predictive models of signaling networks are essential for understanding cell population heterogeneity and designing rational interventions in disease. However, using computational models to predict heterogeneity of signaling dynamics is often challenging because of the extensive variability of biochemical parameters across cell populations. Here, we describe a maximum entropy-based framework for inference of heterogeneity in dynamics of signaling networks (MERIDIAN). MERIDIAN estimates the joint probability distribution over signaling network parameters that is consistent with experimentally measured cell-to-cell variability of biochemical species. We apply the developed approach to investigate the response heterogeneity in the EGFR/Akt signaling network. Our analysis demonstrates that a significant fraction of cells exhibits high phosphorylated Akt (pAkt) levels hours after EGF stimulation. Our findings also suggest that cells with high EGFR levels predominantly contribute to the subpopulation of cells with high pAkt activity. We also discuss how MERIDIAN can be extended to accommodate various experimental measurements.


Subject(s)
Cells/metabolism , Entropy , Genetic Heterogeneity , Humans , Signal Transduction
9.
Nat Methods ; 16(8): 731-736, 2019 08.
Article in English | MEDLINE | ID: mdl-31308552

ABSTRACT

Metagenomic sequencing has enabled detailed investigation of diverse microbial communities, but understanding their spatiotemporal variability remains an important challenge. Here, we present decomposition of variance using replicate sampling (DIVERS), a method based on replicate sampling and spike-in sequencing. The method quantifies the contributions of temporal dynamics, spatial sampling variability, and technical noise to the variances and covariances of absolute bacterial abundances. We applied DIVERS to investigate a high-resolution time series of the human gut microbiome and a spatial survey of a soil bacterial community in Manhattan's Central Park. Our analysis showed that in the gut, technical noise dominated the abundance variability for nearly half of the detected taxa. DIVERS also revealed substantial spatial heterogeneity of gut microbiota, and high temporal covariances of taxa within the Bacteroidetes phylum. In the soil community, spatial variability primarily contributed to abundance fluctuations at short time scales (weeks), while temporal variability dominated at longer time scales (several months).


Subject(s)
Algorithms , Bacteria/genetics , Feces/microbiology , Gastrointestinal Microbiome , Metagenomics/methods , Soil Microbiology , Spatio-Temporal Analysis , Bacteria/classification , Humans , RNA, Ribosomal, 16S , Sequence Analysis, DNA , Specimen Handling
10.
Neural Comput ; 31(5): 980-997, 2019 05.
Article in English | MEDLINE | ID: mdl-30883279

ABSTRACT

Stochastic kernel-based dimensionality-reduction approaches have become popular in the past decade. The central component of many of these methods is a symmetric kernel that quantifies the vicinity between pairs of data points and a kernel-induced Markov chain on the data. Typically, the Markov chain is fully specified by the kernel through row normalization. However, in many cases, it is desirable to impose user-specified stationary-state and dynamical constraints on the Markov chain. Unfortunately, no systematic framework exists to impose such user-defined constraints. Here, based on our previous work on inference of Markov models, we introduce a path entropy maximization based approach to derive the transition probabilities of Markov chains using a kernel and additional user-specified constraints. We illustrate the usefulness of these Markov chains with examples.

11.
J Chem Phys ; 150(5): 054105, 2019 Feb 07.
Article in English | MEDLINE | ID: mdl-30736685

ABSTRACT

Markov State Models (MSMs) describe the rates and routes in conformational dynamics of biomolecules. Computational estimation of MSMs can be expensive because molecular simulations are slow to find and sample the rare transient events. We describe here an efficient approximate way to determine MSM rate matrices by combining maximum caliber (maximizing path entropies) with optimal transport theory (minimizing some path cost function, as when routing trucks on transportation networks) to patch together transient dynamical information from multiple non-equilibrium simulations. We give toy examples.

12.
J Chem Phys ; 148(1): 010901, 2018 Jan 07.
Article in English | MEDLINE | ID: mdl-29306272

ABSTRACT

We review here Maximum Caliber (Max Cal), a general variational principle for inferring distributions of paths in dynamical processes and networks. Max Cal is to dynamical trajectories what the principle of maximum entropy is to equilibrium states or stationary populations. In Max Cal, you maximize a path entropy over all possible pathways, subject to dynamical constraints, in order to predict relative path weights. Many well-known relationships of non-equilibrium statistical physics-such as the Green-Kubo fluctuation-dissipation relations, Onsager's reciprocal relations, and Prigogine's minimum entropy production-are limited to near-equilibrium processes. Max Cal is more general. While it can readily derive these results under those limits, Max Cal is also applicable far from equilibrium. We give examples of Max Cal as a method of inference about trajectory distributions from limited data, finding reaction coordinates in bio-molecular simulations, and modeling the complex dynamics of non-thermal systems such as gene regulatory networks or the collective firing of neurons. We also survey its basis in principle and some limitations.

13.
J Chem Theory Comput ; 14(2): 1111-1119, 2018 Feb 13.
Article in English | MEDLINE | ID: mdl-29323898

ABSTRACT

Rate processes are often modeled using Markov State Models (MSMs). Suppose you know a prior MSM and then learn that your prediction of some particular observable rate is wrong. What is the best way to correct the whole MSM? For example, molecular dynamics simulations of protein folding may sample many microstates, possibly giving correct pathways through them while also giving the wrong overall folding rate when compared to experiment. Here, we describe Caliber Corrected Markov Modeling (C2M2), an approach based on the principle of maximum entropy for updating a Markov model by imposing state- and trajectory-based constraints. We show that such corrections are equivalent to asserting position-dependent diffusion coefficients in continuous-time continuous-space Markov processes modeled by a Smoluchowski equation. We derive the functional form of the diffusion coefficient explicitly in terms of the trajectory-based constraints. We illustrate with examples of 2D particle diffusion and an overdamped harmonic oscillator.


Subject(s)
Markov Chains , Molecular Dynamics Simulation
14.
J Chem Phys ; 147(16): 164901, 2017 Oct 28.
Article in English | MEDLINE | ID: mdl-29096517

ABSTRACT

Quantifying the statistics of occupancy of solvent molecules in the vicinity of solutes is central to our understanding of solvation phenomena. Number fluctuations in small solvation shells around solutes cannot be described within the macroscopic grand canonical framework using a single chemical potential that represents the solvent bath. In this communication, we hypothesize that molecular-sized observation volumes such as solvation shells are best described by coupling the solvation shell with a mixture of particle baths each with its own chemical potential. We confirm our hypotheses by studying the enhanced fluctuations in the occupancy statistics of hard sphere solvent particles around a distinguished hard sphere solute particle. Connections with established theories of solvation are also discussed.

15.
Genetics ; 207(1): 281-295, 2017 09.
Article in English | MEDLINE | ID: mdl-28751420

ABSTRACT

While bacteria divide clonally, horizontal gene transfer followed by homologous recombination is now recognized as an important contributor to their evolution. However, the details of how the competition between clonality and recombination shapes genome diversity remains poorly understood. Using a computational model, we find two principal regimes in bacterial evolution and identify two composite parameters that dictate the evolutionary fate of bacterial species. In the divergent regime, characterized by either a low recombination frequency or strict barriers to recombination, cohesion due to recombination is not sufficient to overcome the mutational drift. As a consequence, the divergence between pairs of genomes in the population steadily increases in the course of their evolution. The species lacks genetic coherence with sexually isolated clonal subpopulations continuously formed and dissolved. In contrast, in the metastable regime, characterized by a high recombination frequency combined with low barriers to recombination, genomes continuously recombine with the rest of the population. The population remains genetically cohesive and temporally stable. Notably, the transition between these two regimes can be affected by relatively small changes in evolutionary parameters. Using the Multi Locus Sequence Typing (MLST) data, we classify a number of bacterial species to be either the divergent or the metastable type. Generalizations of our framework to include selection, ecologically structured populations, and horizontal gene transfer of nonhomologous regions are discussed as well.


Subject(s)
Bacteria/genetics , Evolution, Molecular , Genomic Instability , Models, Genetic , Recombination, Genetic , Gene Frequency , Genetic Drift , Genome, Bacterial , Reproductive Isolation
16.
Science ; 353(6304): 1161-5, 2016 09 09.
Article in English | MEDLINE | ID: mdl-27609895

ABSTRACT

Tumor genetics guides patient selection for many new therapies, and cell culture studies have demonstrated that specific mutations can promote metabolic phenotypes. However, whether tissue context defines cancer dependence on specific metabolic pathways is unknown. Kras activation and Trp53 deletion in the pancreas or the lung result in pancreatic ductal adenocarinoma (PDAC) or non-small cell lung carcinoma (NSCLC), respectively, but despite the same initiating events, these tumors use branched-chain amino acids (BCAAs) differently. NSCLC tumors incorporate free BCAAs into tissue protein and use BCAAs as a nitrogen source, whereas PDAC tumors have decreased BCAA uptake. These differences are reflected in expression levels of BCAA catabolic enzymes in both mice and humans. Loss of Bcat1 and Bcat2, the enzymes responsible for BCAA use, impairs NSCLC tumor formation, but these enzymes are not required for PDAC tumor formation, arguing that tissue of origin is an important determinant of how cancers satisfy their metabolic requirements.


Subject(s)
Amino Acids, Branched-Chain/metabolism , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/metabolism , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/metabolism , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/metabolism , Proto-Oncogene Proteins p21(ras)/genetics , Animals , Gene Expression Regulation, Neoplastic , Humans , Male , Metabolic Networks and Pathways , Mice , Mice, Inbred C57BL , Minor Histocompatibility Antigens/genetics , Mutation , Nitrogen/metabolism , Organ Specificity , Pregnancy Proteins/genetics , Transaminases/genetics
17.
Article in English | MEDLINE | ID: mdl-26565210

ABSTRACT

Maximum-entropy (ME) inference of state probabilities using state-dependent constraints is popular in the study of complex systems. In stochastic systems, how state space topology and path-dependent constraints affect ME-inferred state probabilities remains unknown. To that end, we derive the transition probabilities and the stationary distribution of a maximum path entropy Markov process subject to state- and path-dependent constraints. A main finding is that the stationary distribution over states differs significantly from the Boltzmann distribution and reflects a competition between path multiplicity and imposed constraints. We illustrate our results with particle diffusion on a two-dimensional landscape. Connections with the path integral approach to diffusion are discussed.


Subject(s)
Entropy , Models, Theoretical , Diffusion , Markov Chains
18.
J Chem Theory Comput ; 11(11): 5464-72, 2015 Nov 10.
Article in English | MEDLINE | ID: mdl-26574334

ABSTRACT

We are interested inferring rate processes on networks. In particular, given a network's topology, the stationary populations on its nodes, and a few global dynamical observables, can we infer all the transition rates between nodes? We draw inferences using the principle of maximum caliber (maximum path entropy). We have previously derived results for discrete-time Markov processes. Here, we treat continuous-time processes, such as dynamics among metastable states of proteins. The present work leads to a particularly important analytical result: namely, that when the network is constrained only by a mean jump rate, the rate matrix is given by a square-root dependence of the rate, kab ∝ (πb/πa)(1/2), on πa and πb, the stationary-state populations at nodes a and b. This leads to a fast way to estimate all of the microscopic rates in the system. As an illustration, we show that the method accurately predicts the nonequilibrium transition rates in an in silico gene expression network and transition probabilities among the metastable states of a small peptide at equilibrium. We note also that the method makes sensible predictions for so-called extra-thermodynamic relationships, such as those of Bronsted, Hammond, and others.


Subject(s)
Peptides/chemistry , Promoter Regions, Genetic , Thermodynamics , Gene Expression , Promoter Regions, Genetic/physiology , RNA, Messenger/genetics
19.
J Chem Phys ; 143(5): 051104, 2015 Aug 07.
Article in English | MEDLINE | ID: mdl-26254635

ABSTRACT

There has been interest in finding a general variational principle for non-equilibrium statistical mechanics. We give evidence that Maximum Caliber (Max Cal) is such a principle. Max Cal, a variant of maximum entropy, predicts dynamical distribution functions by maximizing a path entropy subject to dynamical constraints, such as average fluxes. We first show that Max Cal leads to standard near-equilibrium results­including the Green-Kubo relations, Onsager's reciprocal relations of coupled flows, and Prigogine's principle of minimum entropy production­in a way that is particularly simple. We develop some generalizations of the Onsager and Prigogine results that apply arbitrarily far from equilibrium. Because Max Cal does not require any notion of "local equilibrium," or any notion of entropy dissipation, or temperature, or even any restriction to material physics, it is more general than many traditional approaches. It also applicable to flows and traffic on networks, for example.


Subject(s)
Entropy , Models, Theoretical , Probability
20.
Proc Natl Acad Sci U S A ; 112(29): 9070-5, 2015 Jul 21.
Article in English | MEDLINE | ID: mdl-26153419

ABSTRACT

An approximation to the ∼4-Mbp basic genome shared by 32 strains of Escherichia coli representing six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40-115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4-1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.


Subject(s)
Escherichia coli/genetics , Genome, Bacterial , Recombination, Genetic/genetics , Transformation, Genetic , Bacteriophages/genetics , Base Pairing/genetics , Biological Evolution , Clone Cells , Escherichia coli/virology , Genetic Vectors , Models, Genetic , Molecular Sequence Annotation , Mosaicism , Phylogeny , Polymorphism, Single Nucleotide/genetics , Restriction Mapping , Transduction, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...