Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Science ; 385(6706): 276-282, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39024436

ABSTRACT

We describe an approach for designing high-affinity small molecule-binding proteins poised for downstream sensing. We use deep learning-generated pseudocycles with repeating structural units surrounding central binding pockets with widely varying shapes that depend on the geometry and number of the repeat units. We dock small molecules of interest into the most shape complementary of these pseudocycles, design the interaction surfaces for high binding affinity, and experimentally screen to identify designs with the highest affinity. We obtain binders to four diverse molecules, including the polar and flexible methotrexate and thyroxine. Taking advantage of the modular repeat structure and central binding pockets, we construct chemically induced dimerization systems and low-noise nanopore sensors by splitting designs into domains that reassemble upon ligand addition.


Subject(s)
Deep Learning , Protein Binding , Proteins , Small Molecule Libraries , Binding Sites , Ligands , Methotrexate/chemistry , Molecular Docking Simulation , Nanopores , Protein Multimerization , Proteins/chemistry , Small Molecule Libraries/chemistry , Thyroxine/chemistry
2.
Protein Sci ; 33(7): e5086, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38923241

ABSTRACT

Variation in mutation rates at sites in proteins can largely be understood by the constraint that proteins must fold into stable structures. Models that calculate site-specific rates based on protein structure and a thermodynamic stability model have shown a significant but modest ability to predict empirical site-specific rates calculated from sequence. Models that use detailed atomistic models of protein energetics do not outperform simpler approaches using packing density. We demonstrate that a fundamental reason for this is that empirical site-specific rates are the result of the average effect of many different microenvironments in a phylogeny. By analyzing the results of evolutionary dynamics simulations, we show how averaging site-specific rates across many extant protein structures can lead to correct recovery of site-rate prediction. This result is also demonstrated in natural protein sequences and experimental structures. Using predicted structures, we demonstrate that atomistic models can improve upon contact density metrics in predicting site-specific rates from a structure. The results give fundamental insights into the factors governing the distribution of site-specific rates in protein families.


Subject(s)
Proteins , Proteins/chemistry , Proteins/genetics , Protein Conformation , Thermodynamics , Evolution, Molecular , Mutation , Models, Molecular , Molecular Dynamics Simulation
3.
bioRxiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37961294

ABSTRACT

Despite transformative advances in protein design with deep learning, the design of small-molecule-binding proteins and sensors for arbitrary ligands remains a grand challenge. Here we combine deep learning and physics-based methods to generate a family of proteins with diverse and designable pocket geometries, which we employ to computationally design binders for six chemically and structurally distinct small-molecule targets. Biophysical characterization of the designed binders revealed nanomolar to low micromolar binding affinities and atomic-level design accuracy. The bound ligands are exposed at one edge of the binding pocket, enabling the de novo design of chemically induced dimerization (CID) systems; we take advantage of this to create a biosensor with nanomolar sensitivity for cortisol. Our approach provides a general method to design proteins that bind and sense small molecules for a wide range of analytical, environmental, and biomedical applications.

4.
bioRxiv ; 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37790440

ABSTRACT

Sequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets. These binders exhibit specificity closely matching the computational models for the target DNA sequences at as many as 6 base positions and affinities as low as 30-100 nM. The crystal structure of a designed DBP-target site complex is in close agreement with the design model, highlighting the accuracy of the design method. The designed DBPs function in both Escherichia coli and mammalian cells to repress and activate transcription of neighboring genes. Our method is a substantial step towards a general route to small and hence readily deliverable sequence-specific DBPs for gene regulation and editing.

5.
Science ; 380(6642): 266-273, 2023 04 21.
Article in English | MEDLINE | ID: mdl-37079676

ABSTRACT

As a result of evolutionary selection, the subunits of naturally occurring protein assemblies often fit together with substantial shape complementarity to generate architectures optimal for function in a manner not achievable by current design approaches. We describe a "top-down" reinforcement learning-based design approach that solves this problem using Monte Carlo tree search to sample protein conformers in the context of an overall architecture and specified functional constraints. Cryo-electron microscopy structures of the designed disk-shaped nanopores and ultracompact icosahedra are very close to the computational models. The icosohedra enable very-high-density display of immunogens and signaling molecules, which potentiates vaccine response and angiogenesis induction. Our approach enables the top-down design of complex protein nanomaterials with desired system properties and demonstrates the power of reinforcement learning in protein design.


Subject(s)
Machine Learning , Nanostructures , Protein Engineering , Proteins , Cryoelectron Microscopy , Proteins/chemistry
6.
PLoS Comput Biol ; 19(3): e1010262, 2023 03.
Article in English | MEDLINE | ID: mdl-36961827

ABSTRACT

Thermodynamic stability is a crucial fitness constraint in protein evolution and is a central factor in shaping the sequence landscapes of proteins. The correlation between stability and molecular fitness depends on the mechanism that relates the biophysical property with biological function. In the simplest case, stability and fitness are related by the amount of folded protein. However, when proteins are toxic in the unfolded state, the fitness function shifts, resulting in higher stability under mutation-selection balance. Likewise, a higher population size results in a similar change in protein stability, as it magnifies the effect of the selection pressure in evolutionary dynamics. This study investigates how such factors affect the evolution of protein stability, site-specific mutation rates, and residue-residue covariation. To simulate evolutionary trajectories with realistic modeling of protein energetics, we develop an all-atom simulator of protein evolution, RosettaEvolve. By evolving proteins under different fitness functions, we can study how the fitness function affects the distribution of proposed and accepted mutations, site-specific rates, and the prevalence of correlated amino acid substitutions. We demonstrate that fitness pressure affects the proposal distribution of mutational effects, that changes in stability can largely explain variations in site-specific substitution rates in evolutionary trajectories, and that increased fitness pressure results in a stronger covariation signal. Our results give mechanistic insight into the evolutionary consequences of variation in protein stability and provide a basis to rationalize the strong covariation signal observed in natural sequence alignments.


Subject(s)
Evolution, Molecular , Proteins , Proteins/chemistry , Mutation , Computer Simulation , Mutation Rate
7.
Nature ; 614(7949): 774-780, 2023 02.
Article in English | MEDLINE | ID: mdl-36813896

ABSTRACT

De novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds1,2, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine3 and 2-deoxycoelenterazine. The designed active sites position an arginine guanidinium group adjacent to an anion that develops during the reaction in a binding pocket with high shape complementarity. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (with a melting temperature higher than 95 °C) enzyme that has a catalytic efficiency on diphenylterazine (kcat/Km = 106 M-1 s-1) comparable to that of native luciferases, but a much higher substrate specificity. The creation of highly active and specific biocatalysts from scratch with broad applications in biomedicine is a key milestone for computational enzyme design, and our approach should enable generation of a wide range of luciferases and other enzymes.


Subject(s)
Deep Learning , Luciferases , Biocatalysis , Catalytic Domain , Enzyme Stability , Hot Temperature , Luciferases/chemistry , Luciferases/metabolism , Luciferins/metabolism , Luminescence , Oxidation-Reduction , Substrate Specificity
8.
bioRxiv ; 2023 Dec 21.
Article in English | MEDLINE | ID: mdl-38187589

ABSTRACT

A general method for designing proteins to bind and sense any small molecule of interest would be widely useful. Due to the small number of atoms to interact with, binding to small molecules with high affinity requires highly shape complementary pockets, and transducing binding events into signals is challenging. Here we describe an integrated deep learning and energy based approach for designing high shape complementarity binders to small molecules that are poised for downstream sensing applications. We employ deep learning generated psuedocycles with repeating structural units surrounding central pockets; depending on the geometry of the structural unit and repeat number, these pockets span wide ranges of sizes and shapes. For a small molecule target of interest, we extensively sample high shape complementarity pseudocycles to generate large numbers of customized potential binding pockets; the ligand binding poses and the interacting interfaces are then optimized for high affinity binding. We computationally design binders to four diverse molecules, including for the first time polar flexible molecules such as methotrexate and thyroxine, which are expressed at high levels and have nanomolar affinities straight out of the computer. Co-crystal structures are nearly identical to the design models. Taking advantage of the modular repeating structure of pseudocycles and central location of the binding pockets, we constructed low noise nanopore sensors and chemically induced dimerization systems by splitting the binders into domains which assemble into the original pseudocycle pocket upon target molecule addition.

9.
Nature ; 600(7889): 547-552, 2021 12.
Article in English | MEDLINE | ID: mdl-34853475

ABSTRACT

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1-3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.


Subject(s)
Neural Networks, Computer , Proteins , Amino Acid Sequence , Crystallography, X-Ray , Hallucinations , Humans , Protein Conformation , Proteins/chemistry , Proteins/genetics
10.
Protein Sci ; 30(10): 2057-2068, 2021 10.
Article in English | MEDLINE | ID: mdl-34218472

ABSTRACT

Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi-nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.


Subject(s)
Amino Acid Substitution , Evolution, Molecular , Models, Genetic , Protein Conformation , Proteins , Proteins/chemistry , Proteins/genetics , Thermodynamics
11.
Proc Natl Acad Sci U S A ; 118(11)2021 03 16.
Article in English | MEDLINE | ID: mdl-33712545

ABSTRACT

The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.


Subject(s)
Neural Networks, Computer , Proteins/chemistry , Models, Molecular , Protein Conformation , Protein Folding , Thermodynamics
12.
Proc Natl Acad Sci U S A ; 117(36): 22135-22145, 2020 09 08.
Article in English | MEDLINE | ID: mdl-32839327

ABSTRACT

To create new enzymes and biosensors from scratch, precise control over the structure of small-molecule binding sites is of paramount importance, but systematically designing arbitrary protein pocket shapes and sizes remains an outstanding challenge. Using the NTF2-like structural superfamily as a model system, we developed an enumerative algorithm for creating a virtually unlimited number of de novo proteins supporting diverse pocket structures. The enumerative algorithm was tested and refined through feedback from two rounds of large-scale experimental testing, involving in total the assembly of synthetic genes encoding 7,896 designs and assessment of their stability on yeast cell surface, detailed biophysical characterization of 64 designs, and crystal structures of 5 designs. The refined algorithm generates proteins that remain folded at high temperatures and exhibit more pocket diversity than naturally occurring NTF2-like proteins. We expect this approach to transform the design of small-molecule sensors and enzymes by enabling the creation of binding and active site geometries much more optimal for specific design challenges than is accessible by repurposing the limited number of naturally occurring NTF2-like proteins.


Subject(s)
Nucleocytoplasmic Transport Proteins/chemistry , Algorithms , Binding Sites , Computer Simulation , High-Throughput Screening Assays , Models, Molecular , Protein Conformation , Protein Engineering , Protein Stability
13.
Sci Rep ; 8(1): 10786, 2018 Jul 17.
Article in English | MEDLINE | ID: mdl-30018351

ABSTRACT

Anti-carbohydrate monoclonal antibodies (mAbs) hold great promise as cancer therapeutics and diagnostics. However, their specificity can be mixed, and detailed characterization is problematic, because antibody-glycan complexes are challenging to crystallize. Here, we developed a generalizable approach employing high-throughput techniques for characterizing the structure and specificity of such mAbs, and applied it to the mAb TKH2 developed against the tumor-associated carbohydrate antigen sialyl-Tn (STn). The mAb specificity was defined by apparent KD values determined by quantitative glycan microarray screening. Key residues in the antibody combining site were identified by site-directed mutagenesis, and the glycan-antigen contact surface was defined using saturation transfer difference NMR (STD-NMR). These features were then employed as metrics for selecting the optimal 3D-model of the antibody-glycan complex, out of thousands plausible options generated by automated docking and molecular dynamics simulation. STn-specificity was further validated by computationally screening of the selected antibody 3D-model against the human sialyl-Tn-glycome. This computational-experimental approach would allow rational design of potent antibodies targeting carbohydrates.


Subject(s)
Antibodies, Monoclonal/chemistry , Antigens, Tumor-Associated, Carbohydrate/immunology , Models, Molecular , Animals , Antibody Specificity , Antigens, Tumor-Associated, Carbohydrate/chemistry , Cells, Cultured , Computer Simulation , HEK293 Cells , Humans , Mice , Molecular Dynamics Simulation
14.
Proc Natl Acad Sci U S A ; 114(41): 10900-10905, 2017 10 10.
Article in English | MEDLINE | ID: mdl-28973872

ABSTRACT

Natural proteins must both fold into a stable conformation and exert their molecular function. To date, computational design has successfully produced stable and atomically accurate proteins by using so-called "ideal" folds rich in regular secondary structures and almost devoid of loops and destabilizing elements, such as cavities. Molecular function, such as binding and catalysis, however, often demands nonideal features, including large and irregular loops and buried polar interaction networks, which have remained challenging for fold design. Through five design/experiment cycles, we learned principles for designing stable and functional antibody variable fragments (Fvs). Specifically, we (i) used sequence-design constraints derived from antibody multiple-sequence alignments, and (ii) during backbone design, maintained stabilizing interactions observed in natural antibodies between the framework and loops of complementarity-determining regions (CDRs) 1 and 2. Designed Fvs bound their ligands with midnanomolar affinities and were as stable as natural antibodies, despite having >30 mutations from mammalian antibody germlines. Furthermore, crystallographic analysis demonstrated atomic accuracy throughout the framework and in four of six CDRs in one design and atomic accuracy in the entire Fv in another. The principles we learned are general, and can be implemented to design other nonideal folds, generating stable, specific, and precise antibodies and enzymes.


Subject(s)
Acyl-Carrier Protein S-Acetyltransferase/metabolism , Antibodies/chemistry , Antibodies/metabolism , Immunoglobulin Fragments/metabolism , Insulin/metabolism , Acyl-Carrier Protein S-Acetyltransferase/immunology , Antibodies/immunology , Binding Sites, Antibody , Complementarity Determining Regions/chemistry , Complementarity Determining Regions/immunology , Complementarity Determining Regions/metabolism , Crystallography, X-Ray , Humans , Immunoglobulin Fragments/chemistry , Immunoglobulin Fragments/immunology , Insulin/immunology , Ligands , Models, Molecular , Mycobacterium tuberculosis/enzymology , Protein Conformation
15.
Proteins ; 85(1): 30-38, 2017 01.
Article in English | MEDLINE | ID: mdl-27717001

ABSTRACT

Current methods for antibody structure prediction rely on sequence homology to known structures. Although this strategy often yields accurate predictions, models can be stereo-chemically strained. Here, we present a fully automated algorithm, called AbPredict, that disregards sequence homology, and instead uses a Monte Carlo search for low-energy conformations built from backbone segments and rigid-body orientations that appear in antibody molecular structures. We find cases where AbPredict selects accurate loop templates with sequence identity as low as 10%, whereas the template of highest sequence identity diverges substantially from the query's conformation. Accordingly, in several cases reported in the recent Antibody Modeling Assessment benchmark, AbPredict models were more accurate than those from any participant, and the models' stereo-chemical quality was consistently high. Furthermore, in two blind cases provided to us by crystallographers prior to structure determination, the method achieved <1.5 Ångstrom overall backbone accuracy. Accurate modeling of unstrained antibody structures will enable design and engineering of improved binders for biomedical research directly from sequence. Proteins 2016; 85:30-38. © 2016 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Antibodies/chemistry , Computational Biology/methods , Models, Statistical , Software , Amino Acid Sequence , Computer Simulation , Databases, Protein , Humans , Models, Molecular , Monte Carlo Method , Protein Conformation , Thermodynamics
16.
J Proteomics ; 142: 138-48, 2016 06 16.
Article in English | MEDLINE | ID: mdl-27195812

ABSTRACT

UNLABELLED: Calreticulin is a highly conserved multifunctional protein implicated in many different biological systems and has therefore been the subject of intensive research. It is primarily present in the endoplasmatic reticulum where its main functions are to regulate Ca(2+) homeostasis, act as a chaperone and stabilize the MHC class I peptide-loading complex. Although several high-resolution structures of calreticulin exist, these only cover three-quarters of the entire protein leaving the extended structures unsolved. Additionally, the structure of calreticulin is influenced by the presence of Ca(2+). The conformational changes induced by Ca(2+) have not been determined yet as they are hard to study with traditional approaches. Here, we investigated the Ca(2+)-induced conformational changes with a combination of chemical cross-linking, mass spectrometry, bioinformatics analysis and modelling in Rosetta. Using a bifunctional linker, we found a large Ca(2+)-induced change to the cross-linking pattern in calreticulin. Our results are consistent with a high flexibility in the P-loop, a stabilization of the acidic C-terminal and a relatively close interaction of the P-loop and the acidic C-terminal. BIOLOGICAL SIGNIFICANCE: The function of calreticulin, an endoplasmatic reticulin chaperone, is affected by fluctuations in Ca(2+)concentration, but the structural mechanism is unknown. The present work suggests that Ca(2+)-dependent regulation is caused by different conformations of a long proline-rich loop that changes the accessibility to the peptide/lectin-binding site. Our results indicate that the binding of Ca(2+) to calreticulin may thus not only just be a question of Ca(2+) storage but is likely to have an impact on the chaperone activity.


Subject(s)
Calcium/pharmacology , Calreticulin/chemistry , Calcium-Binding Proteins/chemistry , Calreticulin/isolation & purification , Computational Biology , Female , Humans , Mass Spectrometry , Molecular Chaperones/metabolism , Placenta/chemistry , Pregnancy , Protein Binding/drug effects , Protein Conformation/drug effects
17.
Curr Opin Struct Biol ; 39: 39-45, 2016 08.
Article in English | MEDLINE | ID: mdl-27127996

ABSTRACT

Protein self-assembly is extensively used in nature to build functional biomolecules and provides a general approach to design molecular complexes with many intriguing applications. Although computational design of protein-protein interfaces remains difficult, much progress has recently been made in de novo design of protein assemblies with cyclic, helical, cubic, internal and lattice symmetries. Here, we discuss some of the underlying biophysical principles of self-assembly that influence the design problem and highlight methodological advances that have made self-assembly design a fruitful area of protein design.


Subject(s)
Computational Biology/methods , Protein Multimerization , Proteins/chemistry , Protein Structure, Quaternary
18.
Structure ; 23(12): 2377-2386, 2015 Dec 01.
Article in English | MEDLINE | ID: mdl-26526849

ABSTRACT

Recent benchmark studies have demonstrated the difficulties in obtaining accurate predictions of ligand binding conformations to comparative models of G-protein-coupled receptors. We have developed a data-driven optimization protocol, which integrates mutational data and structural information from multiple X-ray receptor structures in combination with a fully flexible ligand docking protocol to determine the binding conformation of AR231453, a small-molecule agonist, in the GPR119 receptor. Resulting models converge to one conformation that explains the majority of data from mutation studies and is consistent with the structure-activity relationship for a large number of AR231453 analogs. Another key property of the refined models is their success in separating active ligands from decoys in a large-scale virtual screening. These results demonstrate that mutation-guided receptor modeling can provide predictions of practical value for describing receptor-ligand interactions and drug discovery.


Subject(s)
Algorithms , High-Throughput Screening Assays/methods , Mutation , Receptors, G-Protein-Coupled/agonists , Amino Acid Sequence , Drug Discovery/methods , Humans , Molecular Docking Simulation , Molecular Sequence Data , Oxadiazoles/pharmacology , Protein Binding , Pyrimidines/pharmacology , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/genetics
19.
Proteins ; 83(8): 1385-406, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25670500

ABSTRACT

Computational design of protein function has made substantial progress, generating new enzymes, binders, inhibitors, and nanomaterials not previously seen in nature. However, the ability to design new protein backbones for function--essential to exert control over all polypeptide degrees of freedom--remains a critical challenge. Most previous attempts to design new backbones computed the mainchain from scratch. Here, instead, we describe a combinatorial backbone and sequence optimization algorithm called AbDesign, which leverages the large number of sequences and experimentally determined molecular structures of antibodies to construct new antibody models, dock them against target surfaces and optimize their sequence and backbone conformation for high stability and binding affinity. We used the algorithm to produce antibody designs that target the same molecular surfaces as nine natural, high-affinity antibodies; in five cases interface sequence identity is above 30%, and in four of those the backbone conformation at the core of the antibody binding surface is within 1 Å root-mean square deviation from the natural antibodies. Designs recapitulate polar interaction networks observed in natural complexes, and amino acid sidechain rigidity at the designed binding surface, which is likely important for affinity and specificity, is high compared to previous design studies. In designed anti-lysozyme antibodies, complementarity-determining regions (CDRs) at the periphery of the interface, such as L1 and H2, show greater backbone conformation diversity than the CDRs at the core of the interface, and increase the binding surface area compared to the natural antibody, potentially enhancing affinity and specificity.


Subject(s)
Complementarity Determining Regions/chemistry , Computational Biology/methods , Protein Conformation , Protein Engineering/methods , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Fuzzy Logic , Humans , Molecular Sequence Data
20.
PLoS One ; 8(7): e67302, 2013.
Article in English | MEDLINE | ID: mdl-23844000

ABSTRACT

The rapidly increasing number of high-resolution X-ray structures of G-protein coupled receptors (GPCRs) creates a unique opportunity to employ comparative modeling and docking to provide valuable insight into the function and ligand binding determinants of novel receptors, to assist in virtual screening and to design and optimize drug candidates. However, low sequence identity between receptors, conformational flexibility, and chemical diversity of ligands present an enormous challenge to molecular modeling approaches. It is our hypothesis that rapid Monte-Carlo sampling of protein backbone and side-chain conformational space with Rosetta can be leveraged to meet this challenge. This study performs unbiased comparative modeling and docking methodologies using 14 distinct high-resolution GPCRs and proposes knowledge-based filtering methods for improvement of sampling performance and identification of correct ligand-receptor interactions. On average, top ranked receptor models built on template structures over 50% sequence identity are within 2.9 Å of the experimental structure, with an average root mean square deviation (RMSD) of 2.2 Å for the transmembrane region and 5 Å for the second extracellular loop. Furthermore, these models are consistently correlated with low Rosetta energy score. To predict their binding modes, ligand conformers of the 14 ligands co-crystalized with the GPCRs were docked against the top ranked comparative models. In contrast to the comparative models themselves, however, it remains difficult to unambiguously identify correct binding modes by score alone. On average, sampling performance was improved by 10(3) fold over random using knowledge-based and energy-based filters. In assessing the applicability of experimental constraints, we found that sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Additionally, in the case of DOR, knowledge of a single specific ligand-protein contact improved sampling efficiency 7 fold. These findings offer specific guidelines which may lead to increased success in determining receptor-ligand complexes.


Subject(s)
Molecular Docking Simulation , Receptors, G-Protein-Coupled/chemistry , Software , Amino Acid Sequence , Binding Sites , Databases, Protein , Humans , Ligands , Molecular Sequence Data , Monte Carlo Method , Protein Binding , Protein Structure, Secondary , Structural Homology, Protein , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...