Search | VHL Regional Portal

1.

Contextual protein and antibody encodings from equivariant graph transformers.

Mahajan, Sai Pooja; Ruffolo, Jeffrey A; Gray, Jeffrey J.

bioRxiv ; 2023 Jul 29.

Article in English | MEDLINE | ID: mdl-37503113

ABSTRACT

The optimal residue identity at each position in a protein is determined by its structural, evolutionary, and functional context. We seek to learn the representation space of the optimal amino-acid residue in different structural contexts in proteins. Inspired by masked language modeling (MLM), our training aims to transduce learning of amino-acid labels from non-masked residues to masked residues in their structural environments and from general (e.g., a residue in a protein) to specific contexts (e.g., a residue at the interface of a protein or antibody complex). Our results on native sequence recovery and forward folding with AlphaFold2 suggest that the amino acid label for a protein residue may be determined from its structural context alone (i.e., without knowledge of the sequence labels of surrounding residues). We further find that the sequence space sampled from our masked models recapitulate the evolutionary sequence neighborhood of the wildtype sequence. Remarkably, the sequences conditioned on highly plastic structures recapitulate the conformational flexibility encoded in the structures. Furthermore, maximum-likelihood interfaces designed with masked models recapitulate wildtype binding energies for a wide range of protein interfaces and binding strengths. We also propose and compare fine-tuning strategies to train models for designing CDR loops of antibodies in the structural context of the antibody-antigen interface by leveraging structural databases for proteins, antibodies (synthetic and experimental) and protein-protein complexes. We show that pretraining on more general contexts improves native sequence recovery for antibody CDR loops, especially for the hypervariable CDR H3, while fine-tuning helps to preserve patterns observed in special contexts.

2.

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.

Ruffolo, Jeffrey A; Chu, Lee-Shin; Mahajan, Sai Pooja; Gray, Jeffrey J.

Nat Commun ; 14(1): 2389, 2023 04 25.

Article in English | MEDLINE | ID: mdl-37185622

ABSTRACT

Antibodies have the capacity to bind a diverse set of antigens, and they have become critical therapeutics and diagnostic molecules. The binding of antibodies is facilitated by a set of six hypervariable loops that are diversified through genetic recombination and mutation. Even with recent advances, accurate structural prediction of these loops remains a challenge. Here, we present IgFold, a fast deep learning method for antibody structure prediction. IgFold consists of a pre-trained language model trained on 558 million natural antibody sequences followed by graph networks that directly predict backbone atom coordinates. IgFold predicts structures of similar or better quality than alternative methods (including AlphaFold) in significantly less time (under 25 s). Accurate structure prediction on this timescale makes possible avenues of investigation that were previously infeasible. As a demonstration of IgFold's capabilities, we predicted structures for 1.4 million paired antibody sequences, providing structural insights to 500-fold more antibodies than have experimentally determined structures.

Subject(s)

Deep Learning , Protein Conformation , Antibodies/chemistry , Complementarity Determining Regions/chemistry , Antigens

3.

Hallucinating structure-conditioned antibody libraries for target-specific binders.

Mahajan, Sai Pooja; Ruffolo, Jeffrey A; Frick, Rahel; Gray, Jeffrey J.

Front Immunol ; 13: 999034, 2022.

Article in English | MEDLINE | ID: mdl-36341416

ABSTRACT

Antibodies are widely developed and used as therapeutics to treat cancer, infectious disease, and inflammation. During development, initial leads routinely undergo additional engineering to increase their target affinity. Experimental methods for affinity maturation are expensive, laborious, and time-consuming and rarely allow the efficient exploration of the relevant design space. Deep learning (DL) models are transforming the field of protein engineering and design. While several DL-based protein design methods have shown promise, the antibody design problem is distinct, and specialized models for antibody design are desirable. Inspired by hallucination frameworks that leverage accurate structure prediction DL models, we propose the FvHallucinator for designing antibody sequences, especially the CDR loops, conditioned on an antibody structure. Such a strategy generates targeted CDR libraries that retain the conformation of the binder and thereby the mode of binding to the epitope on the antigen. On a benchmark set of 60 antibodies, FvHallucinator generates sequences resembling natural CDRs and recapitulates perplexity of canonical CDR clusters. Furthermore, the FvHallucinator designs amino acid substitutions at the VH-VL interface that are enriched in human antibody repertoires and therapeutic antibodies. We propose a pipeline that screens FvHallucinator designs to obtain a library enriched in binders for an antigen of interest. We apply this pipeline to the CDR H3 of the Trastuzumab-HER2 complex to generate in silico designs predicted to improve upon the binding affinity and interfacial properties of the original antibody. Thus, the FvHallucinator pipeline enables generation of inexpensive, diverse, and targeted antibody libraries enriched in binders for antibody affinity maturation.

Subject(s)

Antibodies , Complementarity Determining Regions , Humans , Complementarity Determining Regions/chemistry , Amino Acid Sequence , Antibody Affinity , Antigens , Hallucinations

4.

Induced fit with replica exchange improves protein complex structure prediction.

Harmalkar, Ameya; Mahajan, Sai Pooja; Gray, Jeffrey J.

PLoS Comput Biol ; 18(6): e1010124, 2022 06.

Article in English | MEDLINE | ID: mdl-35658008

ABSTRACT

Despite the progress in prediction of protein complexes over the last decade, recent blind protein complex structure prediction challenges revealed limited success rates (less than 20% models with DockQ score > 0.4) on targets that exhibit significant conformational change upon binding. To overcome limitations in capturing backbone motions, we developed a new, aggressive sampling method that incorporates temperature replica exchange Monte Carlo (T-REMC) and conformational sampling techniques within docking protocols in Rosetta. Our method, ReplicaDock 2.0, mimics induced-fit mechanism of protein binding to sample backbone motions across putative interface residues on-the-fly, thereby recapitulating binding-partner induced conformational changes. Furthermore, ReplicaDock 2.0 clocks in at 150-500 CPU hours per target (protein-size dependent); a runtime that is significantly faster than Molecular Dynamics based approaches. For a benchmark set of 88 proteins with moderate to high flexibility (unbound-to-bound iRMSD over 1.2 Å), ReplicaDock 2.0 successfully docks 61% of moderately flexible complexes and 35% of highly flexible complexes. Additionally, we demonstrate that by biasing backbone sampling particularly towards residues comprising flexible loops or hinge domains, highly flexible targets can be predicted to under 2 Å accuracy. This indicates that additional gains are possible when mobile protein segments are known.

Subject(s)

Benchmarking , Proteins , Monte Carlo Method , Protein Binding , Protein Conformation , Proteins/chemistry

5.

Simultaneous prediction of antibody backbone and side-chain conformations with deep learning.

Akpinaroglu, Deniz; Ruffolo, Jeffrey A; Mahajan, Sai Pooja; Gray, Jeffrey J.

PLoS One ; 17(6): e0258173, 2022.

Article in English | MEDLINE | ID: mdl-35704640

ABSTRACT

Antibody engineering is becoming increasingly popular in medicine for the development of diagnostics and immunotherapies. Antibody function relies largely on the recognition and binding of antigenic epitopes via the loops in the complementarity determining regions. Hence, accurate high-resolution modeling of these loops is essential for effective antibody engineering and design. Deep learning methods have previously been shown to effectively predict antibody backbone structures described as a set of inter-residue distances and orientations. However, antigen binding is also dependent on the specific conformations of surface side-chains. To address this shortcoming, we created DeepSCAb: a deep learning method that predicts inter-residue geometries as well as side-chain dihedrals of the antibody variable fragment. The network requires only sequence as input, rendering it particularly useful for antibodies without any known backbone conformations. Rotamer predictions use an interpretable self-attention layer, which learns to identify structurally conserved anchor positions across several species. We evaluate the performance of the model for discriminating near-native structures from sets of decoys and find that DeepSCAb outperforms similar methods lacking side-chain context. When compared to alternative rotamer repacking methods, which require an input backbone structure, DeepSCAb predicts side-chain conformations competitively. Our findings suggest that DeepSCAb improves antibody structure prediction with accurate side-chain modeling and is adaptable to applications in docking of antibody-antigen complexes and design of new therapeutic antibody sequences.

Subject(s)

Deep Learning , Antigen-Antibody Complex , Protein Conformation , Structural Homology, Protein

6.

Structural basis for peptide substrate specificities of glycosyltransferase GalNAc-T2.

Mahajan, Sai Pooja; Srinivasan, Yashes; Labonte, Jason W; DeLisa, Matthew P; Gray, Jeffrey J.

ACS Catal ; 11(5): 2977-2991, 2021 Mar 05.

Article in English | MEDLINE | ID: mdl-34322281

ABSTRACT

The polypeptide N-acetylgalactosaminyl transferase (GalNAc-T) enzyme family initiates O-linked mucin-type glycosylation. The family constitutes 20 isoenzymes in humans. GalNAc-Ts exhibit both redundancy and finely tuned specificity for a wide range of peptide substrates. In this work, we deciphered the sequence and structural motifs that determine the peptide substrate preferences for the GalNAc-T2 isoform. Our approach involved sampling and characterization of peptide-enzyme conformations obtained from Rosetta Monte Carlo-minimization-based flexible docking. We computationally scanned 19 amino acid residues at positions -1 and +1 of an eight-residue peptide substrate, which comprised a dataset of 361 (19x19) peptides with previously characterized experimental GalNAc-T2 glycosylation efficiencies. The calculations recapitulated experimental specificity data, successfully discriminating between glycosylatable and non-glycosylatable peptides with a probability of 96.5% (ROC-AUC score), a balanced accuracy of 85.5% and a false positive rate of 7.3%. The glycosylatable peptide substrates viz. peptides with proline, serine, threonine, and alanine at the -1 position of the peptide preferentially exhibited cognate sequon-like conformations. The preference for specific residues at the -1 position of the peptide was regulated by enzyme residues R362, K363, Q364, H365 and W331, which modulate the pocket size and specific enzyme-peptide interactions. For the +1 position of the peptide, enzyme residues K281 and K363 formed gating interactions with aromatics and glutamines at the +1 position of the peptide, leading to modes of peptide-binding sub-optimal for catalysis. Overall, our work revealed enzyme features that lead to the finely tuned specificity observed for a broad range of peptide substrates for the GalNAc-T2 enzyme. We anticipate that the key sequence and structural motifs can be extended to analyze specificities of other isoforms of the GalNAc-T family and can be used to guide design of variants with tailored specificity.

7.

Deep Learning in Protein Structural Modeling and Design.

Gao, Wenhao; Mahajan, Sai Pooja; Sulam, Jeremias; Gray, Jeffrey J.

Patterns (N Y) ; 1(9): 100142, 2020 Dec 11.

Article in English | MEDLINE | ID: mdl-33336200

ABSTRACT

Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence â structure â function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.

8.

Geometric potentials from deep learning improve prediction of CDR H3 loop structures.

Ruffolo, Jeffrey A; Guerra, Carlos; Mahajan, Sai Pooja; Sulam, Jeremias; Gray, Jeffrey J.

Bioinformatics ; 36(Suppl_1): i268-i275, 2020 07 01.

Article in English | MEDLINE | ID: mdl-32657412

ABSTRACT

MOTIVATION: Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. This work proposes DeepH3, a deep residual neural network that learns to predict inter-residue distances and orientations from antibody heavy and light chain sequence. The output of DeepH3 is a set of probability distributions over distances and orientation angles between pairs of residues. These distributions are converted to geometric potentials and used to discriminate between decoy structures produced by RosettaAntibody and predict new CDR H3 loop structures de novo. RESULTS: When evaluated on the Rosetta antibody benchmark dataset of 49 targets, DeepH3-predicted potentials identified better, same and worse structures [measured by root-mean-squared distance (RMSD) from the experimental CDR H3 loop structure] than the standard Rosetta energy function for 33, 6 and 10 targets, respectively, and improved the average RMSD of predictions by 32.1% (1.4 Å). Analysis of individual geometric potentials revealed that inter-residue orientations were more effective than inter-residue distances for discriminating near-native CDR H3 loops. When applied to de novo prediction of CDR H3 loop structures, DeepH3 achieves an average RMSD of 2.2 ± 1.1 Å on the Rosetta antibody benchmark. AVAILABILITY AND IMPLEMENTATION: DeepH3 source code and pre-trained model parameters are freely available at https://github.com/Graylab/deepH3-distances-orientations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Antibodies , Complementarity Determining Regions , Models, Molecular , Protein Conformation

9.

The structure of the colorectal cancer-associated enzyme GalNAc-T12 reveals how nonconserved residues dictate its function.

Fernandez, Amy J; Daniel, Earnest James Paul; Mahajan, Sai Pooja; Gray, Jeffrey J; Gerken, Thomas A; Tabak, Lawrence A; Samara, Nadine L.

Proc Natl Acad Sci U S A ; 116(41): 20404-20410, 2019 10 08.

Article in English | MEDLINE | ID: mdl-31548401

ABSTRACT

Polypeptide N-acetylgalactosaminyl transferases (GalNAc-Ts) initiate mucin type O-glycosylation by catalyzing the transfer of N-acetylgalactosamine (GalNAc) to Ser or Thr on a protein substrate. Inactive and partially active variants of the isoenzyme GalNAc-T12 are present in subsets of patients with colorectal cancer, and several of these variants alter nonconserved residues with unknown functions. While previous biochemical studies have demonstrated that GalNAc-T12 selects for peptide and glycopeptide substrates through unique interactions with its catalytic and lectin domains, the molecular basis for this distinct substrate selectivity remains elusive. Here we examine the molecular basis of the activity and substrate selectivity of GalNAc-T12. The X-ray crystal structure of GalNAc-T12 in complex with a di-glycosylated peptide substrate reveals how a nonconserved GalNAc binding pocket in the GalNAc-T12 catalytic domain dictates its unique substrate selectivity. In addition, the structure provides insight into how colorectal cancer mutations disrupt the activity of GalNAc-T12 and illustrates how the rules dictating GalNAc-T12 function are distinct from those for other GalNAc-Ts.

Subject(s)

Colorectal Neoplasms/metabolism , N-Acetylgalactosaminyltransferases/chemistry , N-Acetylgalactosaminyltransferases/metabolism , Neoplasm Proteins/chemistry , Amino Acid Sequence , Humans , Models, Molecular , Protein Conformation

10.

Computational affinity maturation of camelid single-domain intrabodies against the nonamyloid component of alpha-synuclein.

Mahajan, Sai Pooja; Meksiriporn, Bunyarit; Waraho-Zhmayev, Dujduan; Weyant, Kevin B; Kocer, Ilkay; Butler, David C; Messer, Anne; Escobedo, Fernando A; DeLisa, Matthew P.

Sci Rep ; 8(1): 17611, 2018 12 04.

Article in English | MEDLINE | ID: mdl-30514850

ABSTRACT

Improving the affinity of protein-protein interactions is a challenging problem that is particularly important in the development of antibodies for diagnostic and clinical use. Here, we used structure-based computational methods to optimize the binding affinity of VHNAC1, a single-domain intracellular antibody (intrabody) from the camelid family that was selected for its specific binding to the nonamyloid component (NAC) of human α-synuclein (α-syn), a natively disordered protein, implicated in the pathogenesis of Parkinson's disease (PD) and related neurological disorders. Specifically, we performed ab initio modeling that revealed several possible modes of VHNAC1 binding to the NAC region of α-syn as well as mutations that potentially enhance the affinity between these interacting proteins. While our initial design strategy did not lead to improved affinity, it ultimately guided us towards a model that aligned more closely with experimental observations, revealing a key residue on the paratope and the participation of H4 loop residues in binding, as well as confirming the importance of electrostatic interactions. The binding activity of the best intrabody mutant, which involved just a single amino acid mutation compared to parental VHNAC1, was significantly enhanced primarily through a large increase in association rate. Our results indicate that structure-based computational design can be used to successfully improve the affinity of antibodies against natively disordered and weakly immunogenic antigens such as α-syn, even in cases such as ours where crystal structures are unavailable.

Subject(s)

Antibodies/immunology , Antibody Affinity , Molecular Docking Simulation , Single-Chain Antibodies/immunology , alpha-Synuclein/immunology , Animals , Antibodies/chemistry , Antibodies/genetics , Camelidae , Humans , Protein Binding , Single-Chain Antibodies/genetics

11.

Tilting the balance between canonical and noncanonical conformations for the H1 hypervariable loop of a llama VHH through point mutations.

Mahajan, Sai Pooja; Velez-Vega, Camilo; Escobedo, Fernando A.

J Phys Chem B ; 117(1): 13-24, 2013 Jan 10.

Article in English | MEDLINE | ID: mdl-23231492

ABSTRACT

Nanobodies are single-domain antibodies found in camelids. These are the smallest naturally occurring binding domains and derive functionality via three hypervariable loops (H1-H3) that form the binding surface. They are excellent candidates for antibody engineering because of their favorable characteristics like small size, high solubility, and stability. To rationally engineer antibodies with affinity for a specific target, the hypervariable loops can be tailored to obtain the desired binding surface. As a first step toward such a goal, we consider the design of loops with a desired conformation. In this study, we focus on the H1 loop of the anti-hCG llama nanobody that exhibits a noncanonical conformation. We aim to "tilt" the stability of the H1 loop structure from a noncanonical conformation to a (humanized) type 1 canonical conformation by studying the effect of selected mutations to the amino acid sequence of the H1, H2, and proximal residues. We use all-atomistic, explicit-solvent, biased molecular dynamic simulations to simulate the wild-type and mutant loops in a prefolded framework. We thus find mutants with increasing propensity to form a stable type 1 canonical conformation of the H1 loop. Free energy landscapes reveal the existence of conformational isomers of the canonical conformation that may play a role in binding different antigenic surfaces. We also elucidate the approximate mechanism and kinetics of transitions between such conformational isomers by using a Markovian model. We find that a particular three-point mutant has the strongest thermodynamic propensity to form the H1 type 1 canonical structure but also to exhibit transitions between conformational isomers, while a different, more rigid three-point mutant has the strongest propensity to be kinetically trapped in such a canonical structure.

Subject(s)

Immunoglobulin Heavy Chains/chemistry , Point Mutation , Immunoglobulin Heavy Chains/genetics , Kinetics , Markov Chains , Models, Molecular , Protein Conformation , Thermodynamics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL