Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Nature ; 623(7989): 1070-1078, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37968394

ABSTRACT

Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.


Subject(s)
Algorithms , Computer Simulation , Protein Conformation , Proteins , Humans , Bayes Theorem , Directed Molecular Evolution , Machine Learning , Models, Molecular , Protein Folding , Proteins/chemistry , Proteins/metabolism , Semantics , Synthetic Biology/methods , Synthetic Biology/trends
2.
PLoS Comput Biol ; 19(11): e1011111, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37948450

ABSTRACT

Metabolic fluxes, the number of metabolites traversing each biochemical reaction in a cell per unit time, are crucial for assessing and understanding cell function. 13C Metabolic Flux Analysis (13C MFA) is considered to be the gold standard for measuring metabolic fluxes. 13C MFA typically works by leveraging extracellular exchange fluxes as well as data from 13C labeling experiments to calculate the flux profile which best fit the data for a small, central carbon, metabolic model. However, the nonlinear nature of the 13C MFA fitting procedure means that several flux profiles fit the experimental data within the experimental error, and traditional optimization methods offer only a partial or skewed picture, especially in "non-gaussian" situations where multiple very distinct flux regions fit the data equally well. Here, we present a method for flux space sampling through Bayesian inference (BayFlux), that identifies the full distribution of fluxes compatible with experimental data for a comprehensive genome-scale model. This Bayesian approach allows us to accurately quantify uncertainty in calculated fluxes. We also find that, surprisingly, the genome-scale model of metabolism produces narrower flux distributions (reduced uncertainty) than the small core metabolic models traditionally used in 13C MFA. The different results for some reactions when using genome-scale models vs core metabolic models advise caution in assuming strong inferences from 13C MFA since the results may depend significantly on the completeness of the model used. Based on BayFlux, we developed and evaluated novel methods (P-13C MOMA and P-13C ROOM) to predict the biological results of a gene knockout, that improve on the traditional MOMA and ROOM methods by quantifying prediction uncertainty.


Subject(s)
Metabolic Flux Analysis , Models, Biological , Bayes Theorem , Uncertainty , Metabolic Flux Analysis/methods , Carbon Isotopes/metabolism
3.
Nat Commun ; 11(1): 4880, 2020 09 25.
Article in English | MEDLINE | ID: mdl-32978375

ABSTRACT

Through advanced mechanistic modeling and the generation of large high-quality datasets, machine learning is becoming an integral part of understanding and engineering living systems. Here we show that mechanistic and machine learning models can be combined to enable accurate genotype-to-phenotype predictions. We use a genome-scale model to pinpoint engineering targets, efficient library construction of metabolic pathway designs, and high-throughput biosensor-enabled screening for training diverse machine learning algorithms. From a single data-generation cycle, this enables successful forward engineering of complex aromatic amino acid metabolism in yeast, with the best machine learning-guided design recommendations improving tryptophan titer and productivity by up to 74 and 43%, respectively, compared to the best designs used for algorithm training. Thus, this study highlights the power of combining mechanistic and machine learning models to effectively direct metabolic engineering efforts.


Subject(s)
Machine Learning , Metabolic Engineering/methods , Saccharomyces cerevisiae/metabolism , Tryptophan/metabolism , Algorithms , Amino Acids/metabolism , Biochemical Phenomena , Biosensing Techniques , Genotype , Metabolic Networks and Pathways , Models, Biological , Phenotype , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development
4.
Nat Commun ; 11(1): 4879, 2020 09 25.
Article in English | MEDLINE | ID: mdl-32978379

ABSTRACT

Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, fatty acids, and tryptophan. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing.


Subject(s)
Machine Learning , Metabolic Engineering/methods , Synthetic Biology/methods , Bayes Theorem , Beer , Biofuels , Dodecanol/metabolism , Escherichia coli/metabolism , Fatty Acids/metabolism , Saccharomyces cerevisiae/metabolism
5.
Microb Cell Fact ; 19(1): 167, 2020 Aug 18.
Article in English | MEDLINE | ID: mdl-32811554

ABSTRACT

BACKGROUND: Despite the latest advancements in metabolic engineering for genome editing and characterization of host performance, the successful development of robust cell factories used for industrial bioprocesses and accurate prediction of the behavior of microbial systems, especially when shifting from laboratory-scale to industrial conditions, remains challenging. To increase the probability of success of a scale-up process, data obtained from thoroughly performed studies mirroring cellular responses to typical large-scale stimuli may be used to derive crucial information to better understand potential implications of large-scale cultivation on strain performance. This study assesses the feasibility to employ a barcoded yeast deletion library to assess genome-wide strain fitness across a simulated industrial fermentation regime and aims to understand the genetic basis of changes in strain physiology during industrial fermentation, and the corresponding roles these genes play in strain performance. RESULTS: We find that mutant population diversity is maintained through multiple seed trains, enabling large scale fermentation selective pressures to act upon the community. We identify specific deletion mutants that were enriched in all processes tested in this study, independent of the cultivation conditions, which include MCK1, RIM11, MRK1, and YGK3 that all encode homologues of mammalian glycogen synthase kinase 3 (GSK-3). Ecological analysis of beta diversity between all samples revealed significant population divergence over time and showed feed specific consequences of population structure. Further, we show that significant changes in the population diversity during fed-batch cultivations reflect the presence of significant stresses. Our observations indicate that, for this yeast deletion collection, the selection of the feeding scheme which affects the accumulation of the fermentative by-product ethanol impacts the diversity of the mutant pool to a higher degree as compared to the pH of the culture broth. The mutants that were lost during the time of most extreme population selection suggest that specific biological processes may be required to cope with these specific stresses. CONCLUSIONS: Our results demonstrate the feasibility of Bar-seq to assess fermentation associated stresses in yeast populations under industrial conditions and to understand critical stages of a scale-up process where variability emerges, and selection pressure gets imposed. Overall our work highlights a promising avenue to identify genetic loci and biological stress responses required for fitness under industrial conditions.


Subject(s)
Bioreactors/microbiology , Biotechnology/methods , Fermentation , Saccharomyces cerevisiae/physiology , Biodiversity , Gene Deletion , Genes, Fungal , Glycogen Synthase Kinase 3/genetics , Glycogen Synthase Kinase 3/metabolism , Industrial Microbiology , Metabolic Engineering , Stress, Physiological/genetics
6.
ACS Synth Biol ; 9(1): 53-62, 2020 01 17.
Article in English | MEDLINE | ID: mdl-31841635

ABSTRACT

Caprolactam is an important polymer precursor to nylon traditionally derived from petroleum and produced on a scale of 5 million tons per year. Current biological pathways for the production of caprolactam are inefficient with titers not exceeding 2 mg/L, necessitating novel pathways for its production. As development of novel metabolic routes often require thousands of designs and result in low product titers, a highly sensitive biosensor for the final product has the potential to rapidly speed up development times. Here we report a highly sensitive biosensor for valerolactam and caprolactam from Pseudomonas putida KT2440 which is >1000× more sensitive to an exogenous ligand than previously reported sensors. Manipulating the expression of the sensor oplR (PP_3516) substantially altered the sensing parameters, with various vectors showing Kd values ranging from 700 nM (79.1 µg/L) to 1.2 mM (135.6 mg/L). Our most sensitive construct was able to detect in vivo production of caprolactam above background at ∼6 µg/L. The high sensitivity and range of OplR is a powerful tool toward the development of novel routes to the biological synthesis of caprolactam.


Subject(s)
Biosensing Techniques/methods , Caprolactam/metabolism , Lactams/metabolism , Metabolic Engineering/methods , Pseudomonas putida/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Ligands , Plasmids/genetics
7.
ACS Synth Biol ; 8(10): 2385-2396, 2019 10 18.
Article in English | MEDLINE | ID: mdl-31518500

ABSTRACT

A significant bottleneck in synthetic biology involves screening large genetically encoded libraries for desirable phenotypes such as chemical production. However, transcription factor-based biosensors can be leveraged to screen thousands of genetic designs for optimal chemical production in engineered microbes. In this study we characterize two glutarate sensing transcription factors (CsiR and GcdR) from Pseudomonas putida. The genomic contexts of csiR homologues were analyzed, and their DNA binding sites were bioinformatically predicted. Both CsiR and GcdR were purified and shown to bind upstream of their coding sequencing in vitro. CsiR was shown to dissociate from DNA in vitro when exogenous glutarate was added, confirming that it acts as a genetic repressor. Both transcription factors and cognate promoters were then cloned into broad host range vectors to create two glutarate biosensors. Their respective sensing performance features were characterized, and more sensitive derivatives of the GcdR biosensor were created by manipulating the expression of the transcription factor. Sensor vectors were then reintroduced into P. putida and evaluated for their ability to respond to glutarate and various lysine metabolites. Additionally, we developed a novel mathematical approach to describe the usable range of detection for genetically encoded biosensors, which may be broadly useful in future efforts to better characterize biosensor performance.


Subject(s)
Glutarates/metabolism , Lysine/metabolism , Pseudomonas putida/metabolism , Transcription Factors/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Metabolic Engineering/methods , Promoter Regions, Genetic/genetics , Pseudomonas putida/genetics , Synthetic Biology/methods
8.
J Proteome Res ; 18(10): 3752-3761, 2019 10 04.
Article in English | MEDLINE | ID: mdl-31436101

ABSTRACT

Mass spectrometry-based quantitative proteomic analysis has proven valuable for clinical and biotechnology-related research and development. Improvements in sensitivity, resolution, and robustness of mass analyzers have also added value. However, manual sample preparation protocols are often a bottleneck for sample throughput and can lead to poor reproducibility, especially for applications where thousands of samples per month must be analyzed. To alleviate these issues, we developed a "cells-to-peptides" automated workflow for Gram-negative bacteria and fungi that includes cell lysis, protein precipitation, resuspension, quantification, normalization, and tryptic digestion. The workflow takes 2 h to process 96 samples from cell pellets to the initiation of the tryptic digestion step and can process 384 samples in parallel. We measured the efficiency of protein extraction from various amounts of cell biomass and optimized the process for standard liquid chromatography-mass spectrometry systems. The automated workflow was tested by preparing 96 Escherichia coli samples and quantifying over 600 peptides that resulted in a median coefficient of variation of 15.8%. Similar technical variance was observed for three other organisms as measured by highly multiplexed LC-MRM-MS acquisition methods. These results show that this automated sample preparation workflow provides robust, reproducible proteomic samples for high-throughput applications.


Subject(s)
Cells/chemistry , Microbiological Techniques/methods , Peptides/isolation & purification , Proteomics/methods , Specimen Handling/methods , Workflow , Automation , Bacterial Proteins/analysis , Bacterial Proteins/isolation & purification , Escherichia coli/chemistry , Fungal Proteins/analysis , Fungal Proteins/isolation & purification , Fungi/chemistry , Gram-Negative Bacteria/chemistry , Humans , Peptides/analysis , Specimen Handling/standards
9.
ACS Synth Biol ; 8(6): 1337-1351, 2019 06 21.
Article in English | MEDLINE | ID: mdl-31072100

ABSTRACT

The Design-Build-Test-Learn (DBTL) cycle, facilitated by exponentially improving capabilities in synthetic biology, is an increasingly adopted metabolic engineering framework that represents a more systematic and efficient approach to strain development than historical efforts in biofuels and biobased products. Here, we report on implementation of two DBTL cycles to optimize 1-dodecanol production from glucose using 60 engineered Escherichia coli MG1655 strains. The first DBTL cycle employed a simple strategy to learn efficiently from a relatively small number of strains (36), wherein only the choice of ribosome-binding sites and an acyl-ACP/acyl-CoA reductase were modulated in a single pathway operon including genes encoding a thioesterase (UcFatB1), an acyl-ACP/acyl-CoA reductase (Maqu_2507, Maqu_2220, or Acr1), and an acyl-CoA synthetase (FadD). Measured variables included concentrations of dodecanol and all proteins in the engineered pathway. We used the data produced in the first DBTL cycle to train several machine-learning algorithms and to suggest protein profiles for the second DBTL cycle that would increase production. These strategies resulted in a 21% increase in dodecanol titer in Cycle 2 (up to 0.83 g/L, which is more than 6-fold greater than previously reported batch values for minimal medium). Beyond specific lessons learned about optimizing dodecanol titer in E. coli, this study had findings of broader relevance across synthetic biology applications, such as the importance of sequencing checks on plasmids in production strains as well as in cloning strains, and the critical need for more accurate protein expression predictive tools.


Subject(s)
Dodecanol/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Machine Learning , Metabolic Engineering/methods , Algorithms , Metabolic Networks and Pathways/genetics , Synthetic Biology
10.
ACS Synth Biol ; 7(11): 2566-2576, 2018 11 16.
Article in English | MEDLINE | ID: mdl-30351913

ABSTRACT

Robust fermentation of biomass-derived sugars into bioproducts demands the reliable microbial expression of metabolic pathways. Plasmid-based expression systems may suffer from instability and result in highly variable titers, rates, and yields. An established mitigation approach, chemical induced chromosomal expansion (CIChE), expands a singly integrated pathway to plasmid-like copy numbers while maintaining stability in the absence of antibiotic selection pressure. Here, we report parallel integration and chromosomal expansion (PIACE), extensions to CIChE that enable independent expansions of pathway components across multiple loci, use suicide vectors to achieve high-efficiency site-specific integration of sequence-validated multigene components, and introduce a heat-curable plasmid to obviate recA deletion post pathway expansion. We applied PIACE to stabilize an isopentenol pathway across three loci in E. coli DH1 and then generate libraries of pathway component copy number variants to screen for improved titers. Polynomial regressor statistical modeling of the production screening data suggests that increasing copy numbers of all isopentenol pathway components would further improve titers.


Subject(s)
Chromosomes, Bacterial/genetics , Metabolic Engineering/methods , Metabolic Networks and Pathways/physiology , Chromosomes, Bacterial/metabolism , DNA Copy Number Variations , DNA-Binding Proteins/genetics , Escherichia coli/genetics , Genetic Loci , Machine Learning , Pentanols/metabolism , Plasmids/genetics , Plasmids/metabolism
11.
NPJ Syst Biol Appl ; 4: 19, 2018.
Article in English | MEDLINE | ID: mdl-29872542

ABSTRACT

New synthetic biology capabilities hold the promise of dramatically improving our ability to engineer biological systems. However, a fundamental hurdle in realizing this potential is our inability to accurately predict biological behavior after modifying the corresponding genotype. Kinetic models have traditionally been used to predict pathway dynamics in bioengineered systems, but they take significant time to develop, and rely heavily on domain expertise. Here, we show that the combination of machine learning and abundant multiomics data (proteomics and metabolomics) can be used to effectively predict pathway dynamics in an automated fashion. The new method outperforms a classical kinetic model, and produces qualitative and quantitative predictions that can be used to productively guide bioengineering efforts. This method systematically leverages arbitrary amounts of new data to improve predictions, and does not assume any particular interactions, but rather implicitly chooses the most predictive ones.

12.
Nat Commun ; 9(1): 965, 2018 03 20.
Article in English | MEDLINE | ID: mdl-29559655

ABSTRACT

Flowers of the hop plant provide both bitterness and "hoppy" flavor to beer. Hops are, however, both a water and energy intensive crop and vary considerably in essential oil content, making it challenging to achieve a consistent hoppy taste in beer. Here, we report that brewer's yeast can be engineered to biosynthesize aromatic monoterpene molecules that impart hoppy flavor to beer by incorporating recombinant DNA derived from yeast, mint, and basil. Whereas metabolic engineering of biosynthetic pathways is commonly enlisted to maximize product titers, tuning expression of pathway enzymes to affect target production levels of multiple commercially important metabolites without major collateral metabolic changes represents a unique challenge. By applying state-of-the-art engineering techniques and a framework to guide iterative improvement, strains are generated with target performance characteristics. Beers produced using these strains are perceived as hoppier than traditionally hopped beers by a sensory panel in a double-blind tasting.


Subject(s)
Beer , Genes, Fungal , Saccharomyces cerevisiae/genetics , Fermentation , Genetic Engineering , Hydro-Lyases/genetics , Hydro-Lyases/metabolism , Monoterpenes/metabolism , Pilot Projects , Plant Proteins/genetics , Plant Proteins/metabolism , Saccharomyces cerevisiae/metabolism
13.
ACS Synth Biol ; 6(12): 2248-2259, 2017 12 15.
Article in English | MEDLINE | ID: mdl-28826210

ABSTRACT

Although recent advances in synthetic biology allow us to produce biological designs more efficiently than ever, our ability to predict the end result of these designs is still nascent. Predictive models require large amounts of high-quality data to be parametrized and tested, which are not generally available. Here, we present the Experiment Data Depot (EDD), an online tool designed as a repository of experimental data and metadata. EDD provides a convenient way to upload a variety of data types, visualize these data, and export them in a standardized fashion for use with predictive algorithms. In this paper, we describe EDD and showcase its utility for three different use cases: storage of characterized synthetic biology parts, leveraging proteomics data to improve biofuel yield, and the use of extracellular metabolite concentrations to predict intracellular metabolic fluxes.


Subject(s)
Information Storage and Retrieval , Metadata , Models, Biological , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...