Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-38559182

RESUMO

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

2.
Nat Chem Eng ; 1(1): 97-107, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38468718

RESUMO

Protein engineering has nearly limitless applications across chemistry, energy and medicine, but creating new proteins with improved or novel functions remains slow, labor-intensive and inefficient. Here we present the Self-driving Autonomous Machines for Protein Landscape Exploration (SAMPLE) platform for fully autonomous protein engineering. SAMPLE is driven by an intelligent agent that learns protein sequence-function relationships, designs new proteins and sends designs to a fully automated robotic system that experimentally tests the designed proteins and provides feedback to improve the agent's understanding of the system. We deploy four SAMPLE agents with the goal of engineering glycoside hydrolase enzymes with enhanced thermal tolerance. Despite showing individual differences in their search behavior, all four agents quickly converge on thermostable enzymes. Self-driving laboratories automate and accelerate the scientific discovery process and hold great potential for the fields of protein engineering and synthetic biology.

3.
bioRxiv ; 2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37987009

RESUMO

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape.

4.
PLoS Comput Biol ; 19(3): e1010956, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36857380

RESUMO

Directed laboratory evolution applies iterative rounds of mutation and selection to explore the protein fitness landscape and provides rich information regarding the underlying relationships between protein sequence, structure, and function. Laboratory evolution data consist of protein sequences sampled from evolving populations over multiple generations and this data type does not fit into established supervised and unsupervised machine learning approaches. We develop a statistical learning framework that models the evolutionary process and can infer the protein fitness landscape from multiple snapshots along an evolutionary trajectory. We apply our modeling approach to dihydrofolate reductase (DHFR) laboratory evolution data and the resulting landscape parameters capture important aspects of DHFR structure and function. We use the resulting model to understand the structure of the fitness landscape and find numerous examples of epistasis but an overall global peak that is evolutionarily accessible from most starting sequences. Finally, we use the model to perform an in silico extrapolation of the DHFR laboratory evolution trajectory and computationally design proteins from future evolutionary rounds.


Assuntos
Aptidão Genética , Proteínas , Aptidão Genética/genética , Proteínas/genética , Proteínas/metabolismo , Mutação/genética , Tetra-Hidrofolato Desidrogenase/genética , Tetra-Hidrofolato Desidrogenase/metabolismo , Sequência de Aminoácidos , Evolução Molecular , Modelos Genéticos , Epistasia Genética
5.
Protein Sci ; 32(4): e4597, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36794431

RESUMO

Angiotensin-converting enzyme 2 (ACE2) has been investigated for its ability to beneficially modulate the angiotensin receptor (ATR) therapeutic axis to treat multiple human diseases. Its broad substrate scope and diverse physiological roles, however, limit its potential as a therapeutic agent. In this work, we address this limitation by establishing a yeast display-based liquid chromatography screen that enabled use of directed evolution to discover ACE2 variants that possess both wild-type or greater Ang-II hydrolytic activity and improved specificity toward Ang-II relative to the off-target peptide substrate Apelin-13. To obtain these results, we screened ACE2 active site libraries to reveal three substitution-tolerant positions (M360, T371, and Y510) that can be mutated to enhance ACE2's activity profile and followed up on these hits with focused double mutant libraries to further improve the enzyme. Relative to wild-type ACE2, our top variant (T371L/Y510Ile) displayed a sevenfold increase in Ang-II turnover number (kcat ), a sixfold diminished catalytic efficiency (kcat /Km ) on Apelin-13, and an overall decreased activity on other ACE2 substrates that were not directly assayed in the directed evolution screen. At physiologically relevant substrate concentrations, T371L/Y510Ile hydrolyzes as much or more Ang-II than wild-type ACE2 with concomitant Ang-II:Apelin-13 specificity improvements reaching 30-fold. Our efforts have delivered ATR axis-acting therapeutic candidates with relevance to both established and unexplored ACE2 therapeutic applications and provide a foundation for further ACE2 engineering efforts.


Assuntos
Enzima de Conversão de Angiotensina 2 , Peptidil Dipeptidase A , Humanos , Peptidil Dipeptidase A/genética , Fragmentos de Peptídeos , Angiotensina I , Peptídeos
6.
Cell Rep Methods ; 2(7): 100242, 2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35880021

RESUMO

In this work, we developed a simple and robust assay to rapidly detect SNPs in nucleic acid samples. Our approach combines loop-mediated isothermal amplification (LAMP)-based target amplification with fluorescent probes to detect SNPs with high specificity. A competitive "sink" strand preferentially binds to non-SNP amplicons and shifts the free energy landscape to favor specific activation by SNP products. We demonstrated the broad utility and reliability of our SNP-LAMP method by detecting three distinct SNPs across the human genome. We also designed an assay to rapidly detect highly transmissible severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants from crude biological samples. This work demonstrates that competitive SNP-LAMP is a powerful and universal method that could be applied in point-of-care settings to detect any target SNP with high specificity and sensitivity. We additionally developed a publicly available web application for researchers to design SNP-LAMP probes for any target sequence of interest.


Assuntos
COVID-19 , Polimorfismo de Nucleotídeo Único , Humanos , Polimorfismo de Nucleotídeo Único/genética , COVID-19/genética , SARS-CoV-2/genética , Reprodutibilidade dos Testes , Sistemas Automatizados de Assistência Junto ao Leito
7.
Curr Opin Biotechnol ; 75: 102713, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35413604

RESUMO

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.


Assuntos
Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos , Biotecnologia , Engenharia de Proteínas/métodos , Proteínas/química
8.
Protein Eng Des Sel ; 352022 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-35174856

RESUMO

Understanding how severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) interacts with different mammalian angiotensin-converting enzyme II (ACE2) cell entry receptors elucidates determinants of virus transmission and facilitates development of vaccines for humans and animals. Yeast display-based directed evolution identified conserved ACE2 mutations that increase spike binding across multiple species. Gln42Leu increased ACE2-spike binding for human and four of four other mammalian ACE2s; Leu79Ile had an effect for human and three of three mammalian ACE2s. These residues are highly represented, 83% for Gln42 and 56% for Leu79, among mammalian ACE2s. The above findings can be important in protecting humans and animals from existing and future SARS-CoV-2 variants.


Assuntos
COVID-19 , SARS-CoV-2 , Enzima de Conversão de Angiotensina 2 , Animais , Humanos , Mutação , Ligação Proteica , Saccharomyces cerevisiae/metabolismo , Glicoproteína da Espícula de Coronavírus/genética
10.
Cell Death Discov ; 8(1): 7, 2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-35013287

RESUMO

The human caspase family comprises 12 cysteine proteases that are centrally involved in cell death and inflammation responses. The members of this family have conserved sequences and structures, highly similar enzymatic activities and substrate preferences, and overlapping physiological roles. In this paper, we present a deep mutational scan of the executioner caspases CASP3 and CASP7 to dissect differences in their structure, function, and regulation. Our approach leverages high-throughput microfluidic screening to analyze hundreds of thousands of caspase variants in tightly controlled in vitro reactions. The resulting data provides a large-scale and unbiased view of the impact of amino acid substitutions on the proteolytic activity of CASP3 and CASP7. We use this data to pinpoint key functional differences between CASP3 and CASP7, including a secondary internal cleavage site, CASP7 Q196 that is not present in CASP3. Our results will open avenues for inquiry in caspase function and regulation that could potentially inform the development of future caspase-specific therapeutics.

11.
bioRxiv ; 2022 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-33758860

RESUMO

Understanding how SARS-CoV-2 interacts with different mammalian angiotensin-converting enzyme II (ACE2) cell entry receptors elucidates determinants of virus transmission and facilitates development of vaccines for humans and animals. Yeast display-based directed evolution identified conserved ACE2 mutations that increase spike binding across multiple species. Gln42Leu increased ACE2-spike binding for human and four of four other mammalian ACE2s; Leu79Ile had a effect for human and three of three mammalian ACE2s. These residues are highly represented, 83% for Gln42 and 56% for Leu79, among mammalian ACE2s. The above findings can be important in protecting humans and animals from existing and future SARS-CoV-2 variants.

12.
Proc Natl Acad Sci U S A ; 118(48)2021 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-34815338

RESUMO

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.


Assuntos
Sequência de Aminoácidos/genética , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos/fisiologia , Fenômenos Bioquímicos , Aprendizado Profundo , Aprendizado de Máquina , Mutação , Redes Neurais de Computação , Proteínas/metabolismo , Relação Estrutura-Atividade
13.
Nat Commun ; 12(1): 5825, 2021 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-34611172

RESUMO

Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.


Assuntos
Aldeído Oxirredutases/metabolismo , Álcoois Graxos/metabolismo , Aprendizado de Máquina , Aldeído Oxirredutases/genética , Engenharia Metabólica/métodos
14.
Metab Eng ; 67: 216-226, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34229079

RESUMO

In order to make renewable fuels and chemicals from microbes, new methods are required to engineer microbes more intelligently. Computational approaches, to engineer strains for enhanced chemical production typically rely on detailed mechanistic models (e.g., kinetic/stoichiometric models of metabolism)-requiring many experimental datasets for their parameterization-while experimental methods may require screening large mutant libraries to explore the design space for the few mutants with desired behaviors. To address these limitations, we developed an active and machine learning approach (ActiveOpt) to intelligently guide experiments to arrive at an optimal phenotype with minimal measured datasets. ActiveOpt was applied to two separate case studies to evaluate its potential to increase valine yields and neurosporene productivity in Escherichia coli. In both the cases, ActiveOpt identified the best performing strain in fewer experiments than the case studies used. This work demonstrates that machine and active learning approaches have the potential to greatly facilitate metabolic engineering efforts to rapidly achieve its objectives.


Assuntos
Aprendizado de Máquina , Engenharia Metabólica , Escherichia coli/genética , Fenótipo
15.
Nucleic Acids Res ; 49(18): e103, 2021 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-34233007

RESUMO

Experimental methods that capture the individual properties of single cells are revealing the key role of cell-to-cell variability in countless biological processes. These single-cell methods are becoming increasingly important across the life sciences in fields such as immunology, regenerative medicine and cancer biology. In addition to high-dimensional transcriptomic techniques such as single-cell RNA sequencing, there is a need for fast, simple and high-throughput assays to enumerate cell samples based on RNA biomarkers. In this work, we present single-cell nucleic acid profiling in droplets (SNAPD) to analyze sets of transcriptional markers in tens of thousands of single mammalian cells. Individual cells are encapsulated in aqueous droplets on a microfluidic chip and the RNA markers in each cell are amplified. Molecular logic circuits then integrate these amplicons to categorize cells based on the transcriptional markers and produce a detectable fluorescence output. SNAPD is capable of analyzing over 100,000 cells per hour and can be used to quantify distinct cell types within heterogeneous populations, detect rare cells at frequencies down to 0.1% and enrich specific cell types using microfluidic sorting. SNAPD provides a simple, rapid, low cost and scalable approach to study complex phenotypes in heterogeneous cell populations.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Técnicas Analíticas Microfluídicas/métodos , Microfluídica/métodos , Ácidos Nucleicos/análise , Análise de Célula Única/métodos , Linhagem Celular , Humanos , Dispositivos Lab-On-A-Chip , Transcriptoma
16.
PLoS One ; 16(5): e0251585, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33979391

RESUMO

Understanding how human ACE2 genetic variants differ in their recognition by SARS-CoV-2 can facilitate the leveraging of ACE2 as an axis for treating and preventing COVID-19. In this work, we experimentally interrogate thousands of ACE2 mutants to identify over one hundred human single-nucleotide variants (SNVs) that are likely to have altered recognition by the virus, and make the complementary discovery that ACE2 residues distant from the spike interface influence the ACE2-spike interaction. These findings illuminate new links between ACE2 sequence and spike recognition, and could find substantial utility in further fundamental research that augments epidemiological analyses and clinical trial design in the contexts of both existing strains of SARS-CoV-2 and novel variants that may arise in the future.


Assuntos
Enzima de Conversão de Angiotensina 2/genética , COVID-19/metabolismo , Glicoproteína da Espícula de Coronavírus/genética , Enzima de Conversão de Angiotensina 2/metabolismo , Sítios de Ligação/genética , COVID-19/genética , Variação Genética/genética , Humanos , Modelos Moleculares , Peptidil Dipeptidase A/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética , Receptores Virais/genética , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/patogenicidade , Glicoproteína da Espícula de Coronavírus/metabolismo , Replicação Viral/genética
17.
Cell Syst ; 12(1): 92-101.e8, 2021 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-33212013

RESUMO

Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos
18.
bioRxiv ; 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32995796

RESUMO

Understanding how human ACE2 genetic variants differ in their recognition by SARS-CoV-2 can have a major impact in leveraging ACE2 as an axis for treating and preventing COVID-19. In this work, we experimentally interrogate thousands of ACE2 mutants to identify over one hundred human single-nucleotide variants (SNVs) that are likely to have altered recognition by the virus, and make the complementary discovery that ACE2 residues distant from the spike interface can have a strong influence upon the ACE2-spike interaction. These findings illuminate new links between ACE2 sequence and spike recognition, and will find wide-ranging utility in SARS-CoV-2 fundamental research, epidemiological analyses, and clinical trial design.

19.
Nat Commun ; 11(1): 2418, 2020 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-32415107

RESUMO

The spatial organization of microbial communities arises from a complex interplay of biotic and abiotic interactions, and is a major determinant of ecosystem functions. Here we design a microfluidic platform to investigate how the spatial arrangement of microbes impacts gene expression and growth. We elucidate key biochemical parameters that dictate the mapping between spatial positioning and gene expression patterns. We show that distance can establish a low-pass filter to periodic inputs and can enhance the fidelity of information processing. Positive and negative feedback can play disparate roles in the synchronization and robustness of a genetic oscillator distributed between two strains to spatial separation. Quantification of growth and metabolite release in an amino-acid auxotroph community demonstrates that the interaction network and stability of the community are highly sensitive to temporal perturbations and spatial arrangements. In sum, our microfluidic platform can quantify spatiotemporal parameters influencing diffusion-mediated interactions in microbial consortia.


Assuntos
Dispositivos Lab-On-A-Chip , Consórcios Microbianos , Transdução de Sinais , Ecologia , Ecossistema , Desenho de Equipamento , Escherichia coli/fisiologia , Microbioma Gastrointestinal , Regulação Bacteriana da Expressão Gênica , Microfluídica/instrumentação , Modelos Genéticos , Oscilometria , Percepção de Quorum
20.
Cell Syst ; 9(3): 229-242.e4, 2019 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31494089

RESUMO

Microbial interactions are major drivers of microbial community dynamics and functions but remain challenging to identify because of limitations in parallel culturing and absolute abundance quantification of community members across environments and replicates. To this end, we developed Microbial Interaction Network Inference in microdroplets (MINI-Drop). Fluorescence microscopy coupled to computer vision techniques were used to rapidly determine the absolute abundance of each strain in hundreds to thousands of droplets per condition. We showed that MINI-Drop could accurately infer pairwise and higher-order interactions in synthetic consortia. We developed a stochastic model of community assembly to provide insight into the heterogeneity in community states across droplets. Finally, we elucidated the complex web of interactions linking antibiotics and different species in a synthetic consortium. In sum, we demonstrated a robust and generalizable method to infer microbial interaction networks by random encapsulation of sub-communities into microfluidic droplets.


Assuntos
Gotículas Lipídicas/microbiologia , Consórcios Microbianos/fisiologia , Interações Microbianas/fisiologia , Microfluídica/métodos , Animais , Antibacterianos/metabolismo , Biodiversidade , Interações Hospedeiro-Patógeno , Humanos , Microscopia de Fluorescência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...