Search | VHL Regional Portal

Neural network extrapolation to distant regions of the protein fitness landscape.

Fahlberg, Sarah A; Freschlin, Chase R; Heinzelman, Pete; Romero, Philip A.

bioRxiv ; 2023 Nov 09.

Article in English | MEDLINE | ID: mdl-37987009

ABSTRACT

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape.

Machine learning to navigate fitness landscapes for protein engineering.

Freschlin, Chase R; Fahlberg, Sarah A; Romero, Philip A.

Curr Opin Biotechnol ; 75: 102713, 2022 06.

Article in English | MEDLINE | ID: mdl-35413604

ABSTRACT

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.

Subject(s)

Machine Learning , Protein Engineering , Amino Acid Sequence , Biotechnology , Protein Engineering/methods , Proteins/chemistry

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Gelman, Sam; Fahlberg, Sarah A; Heinzelman, Pete; Romero, Philip A; Gitter, Anthony.

Proc Natl Acad Sci U S A ; 118(48)2021 11 30.

Article in English | MEDLINE | ID: mdl-34815338

ABSTRACT

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Subject(s)

Amino Acid Sequence/genetics , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence/physiology , Biochemical Phenomena , Deep Learning , Machine Learning , Mutation , Neural Networks, Computer , Proteins/metabolism , Structure-Activity Relationship

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production.

Greenhalgh, Jonathan C; Fahlberg, Sarah A; Pfleger, Brian F; Romero, Philip A.

Nat Commun ; 12(1): 5825, 2021 10 05.

Article in English | MEDLINE | ID: mdl-34611172

ABSTRACT

Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.

Subject(s)

Aldehyde Oxidoreductases/metabolism , Fatty Alcohols/metabolism , Machine Learning , Aldehyde Oxidoreductases/genetics , Metabolic Engineering/methods

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL