Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Sci Rep ; 13(1): 11612, 2023 07 18.
Article in English | MEDLINE | ID: mdl-37463925

ABSTRACT

Antibodies with similar amino acid sequences, especially across their complementarity-determining regions, often share properties. Finding that an antibody of interest has a similar sequence to naturally expressed antibodies in healthy or diseased repertoires is a powerful approach for the prediction of antibody properties, such as immunogenicity or antigen specificity. However, as the number of available antibody sequences is now in the billions and continuing to grow, repertoire mining for similar sequences has become increasingly computationally expensive. Existing approaches are limited by either being low-throughput, non-exhaustive, not antibody specific, or only searching against entire chain sequences. Therefore, there is a need for a specialized tool, optimized for a rapid and exhaustive search of any antibody region against all known antibodies, to better utilize the full breadth of available repertoire sequences. We introduce Known Antibody Search (KA-Search), a tool that allows for the rapid search of billions of antibody variable domains by amino acid sequence identity across either the variable domain, the complementarity-determining regions, or a user defined antibody region. We show KA-Search in operation on the [Formula: see text]2.4 billion antibody sequences available in the OAS database. KA-Search can be used to find the most similar sequences from OAS within 30 minutes and a representative subset of 10 million sequences in less than 9 seconds. We give examples of how KA-Search can be used to obtain new insights about an antibody of interest. KA-Search is freely available at https://github.com/oxpig/kasearch .


Subject(s)
Antibodies , Complementarity Determining Regions , Complementarity Determining Regions/chemistry , Amino Acid Sequence
2.
Bioinform Adv ; 2(1): vbac046, 2022.
Article in English | MEDLINE | ID: mdl-36699403

ABSTRACT

Motivation: General protein language models have been shown to summarize the semantics of protein sequences into representations that are useful for state-of-the-art predictive methods. However, for antibody specific problems, such as restoring residues lost due to sequencing errors, a model trained solely on antibodies may be more powerful. Antibodies are one of the few protein types where the volume of sequence data needed for such language models is available, e.g. in the Observed Antibody Space (OAS) database. Results: Here, we introduce AbLang, a language model trained on the antibody sequences in the OAS database. We demonstrate the power of AbLang by using it to restore missing residues in antibody sequence data, a key issue with B-cell receptor repertoire sequencing, e.g. over 40% of OAS sequences are missing the first 15 amino acids. AbLang restores the missing residues of antibody sequences better than using IMGT germlines or the general protein language model ESM-1b. Further, AbLang does not require knowledge of the germline of the antibody and is seven times faster than ESM-1b. Availability and implementation: AbLang is a python package available at https://github.com/oxpig/AbLang. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.
Methods Mol Biol ; 2165: 199-216, 2020.
Article in English | MEDLINE | ID: mdl-32621226

ABSTRACT

Many of the biological functions of the cell are driven by protein-protein interactions. However, determining which proteins interact and exactly how they do so to enable their functions, remain major research questions. Functional interactions are dependent on a number of complicated factors; therefore, modeling the three-dimensional structure of protein-protein complexes is still considered a complex endeavor. Nevertheless, the rewards for modeling protein interactions to atomic level detail are substantial, and there are numerous examples of how models can provide useful information for drug design, protein engineering, systems biology, and understanding of the immune system. Here, we provide practical guidelines for docking proteins using the web-server, SwarmDock, a flexible protein-protein docking method. Moreover, we provide an overview of the factors that need to be considered when deciding whether docking is likely to be successful.


Subject(s)
Molecular Docking Simulation/methods , Protein Conformation , Software , Binding Sites , Protein Binding
4.
Mol Biol Evol ; 36(9): 2086-2103, 2019 09 01.
Article in English | MEDLINE | ID: mdl-31114882

ABSTRACT

Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state "Dayhoff-like" model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.


Subject(s)
Evolution, Molecular , Models, Biological , Protein Conformation , Amino Acid Substitution , Markov Chains
5.
Bioinformatics ; 35(3): 462-469, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30020414

ABSTRACT

Motivation: Understanding the relationship between the sequence, structure, binding energy, binding kinetics and binding thermodynamics of protein-protein interactions is crucial to understanding cellular signaling, the assembly and regulation of molecular complexes, the mechanisms through which mutations lead to disease, and protein engineering. Results: We present SKEMPI 2.0, a major update to our database of binding free energy changes upon mutation for structurally resolved protein-protein interactions. This version now contains manually curated binding data for 7085 mutations, an increase of 133%, including changes in kinetics for 1844 mutations, enthalpy and entropy changes for 443 mutations, and 440 mutations, which abolish detectable binding. Availability and implementation: The database is available as supplementary data and at https://life.bsc.es/pid/skempi2/. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Databases, Protein , Mutation , Protein Binding , Kinetics , Thermodynamics
6.
Methods Mol Biol ; 1764: 413-428, 2018.
Article in English | MEDLINE | ID: mdl-29605931

ABSTRACT

The atomic structures of protein complexes can provide useful information for drug design, protein engineering, systems biology, and understanding pathology. Obtaining this information experimentally can be challenging. However, if the structures of the subunits are known, then it is often possible to model the complex computationally. This chapter provide practical guidelines for docking proteins using the SwarmDock flexible protein-protein docking method, providing an overview of the factors that need to be considered when deciding whether docking is likely to be successful, the preparation of structural input, generation of docked poses, analysis and ranking of docked poses, and the validation of models using external data.


Subject(s)
Adaptor Proteins, Signal Transducing/metabolism , Filamins/metabolism , Molecular Docking Simulation , Phosphoproteins/metabolism , Protein Interaction Domains and Motifs , Software , Adaptor Proteins, Signal Transducing/chemistry , Algorithms , Filamins/chemistry , Humans , Models, Molecular , Phosphoproteins/chemistry , Protein Binding , Protein Conformation
7.
Proteins ; 85(7): 1287-1297, 2017 07.
Article in English | MEDLINE | ID: mdl-28342242

ABSTRACT

Protein-protein interactions play fundamental roles in biological processes including signaling, metabolism, and trafficking. While the structure of a protein complex reveals crucial details about the interaction, it is often difficult to acquire this information experimentally. As the number of interactions discovered increases faster than they can be characterized, protein-protein docking calculations may be able to reduce this disparity by providing models of the interacting proteins. Rigid-body docking is a widely used docking approach, and is often capable of generating a pool of models within which a near-native structure can be found. These models need to be scored in order to select the acceptable ones from the set of poses. Recently, more than 100 scoring functions from the CCharPPI server were evaluated for this task using decoy structures generated with SwarmDock. Here, we extend this analysis to identify the predictive success rates of the scoring functions on decoys from three rigid-body docking programs, ZDOCK, FTDock, and SDOCK, allowing us to assess the transferability of the functions. We also apply set-theoretic measure to test whether the scoring functions are capable of identifying near-native poses within different subsets of the benchmark. This information can provide guides for the use of the most efficient scoring function for each docking method, as well as instruct future scoring functions development efforts. Proteins 2017; 85:1287-1297. © 2017 Wiley Periodicals, Inc.


Subject(s)
Models, Statistical , Molecular Docking Simulation/statistics & numerical data , Proteins/chemistry , Research Design , Benchmarking , Internet , Protein Interaction Mapping , Software
8.
J Chem Theory Comput ; 13(3): 1401-1410, 2017 Mar 14.
Article in English | MEDLINE | ID: mdl-28230364

ABSTRACT

Many proteins can adopt multiple distinct conformational states which often play different functional roles. Previous studies have shown that the underlying global dynamics through which these states are accessed are, at least in part, encoded by the protein's topology. In this work we present a method for generating transition pathways between states by perturbing the protein toward a target conformational state along thermally accessible collective motions calculated from the starting conformation. Specifically, the least absolute shrinkage and selection operator (LASSO) is used to identify the most parsimonious route along soft modes calculated using the anisotropic network model. In a survey of 436 conformational changes following protein-protein interaction, we show that such a path exists for most cases and that selected paths are low in energy. We discuss the implications for the atomic modeling of protein recognition and provide soft energy and parameter bounds which can be employed to efficiently constrain the sampling of such pathways.

9.
Bioinformatics ; 33(12): 1806-1813, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28200016

ABSTRACT

MOTIVATION: In order to function, proteins frequently bind to one another and form 3D assemblies. Knowledge of the atomic details of these structures helps our understanding of how proteins work together, how mutations can lead to disease, and facilitates the designing of drugs which prevent or mimic the interaction. RESULTS: Atomic modeling of protein-protein interactions requires the selection of near-native structures from a set of docked poses based on their calculable properties. By considering this as an information retrieval problem, we have adapted methods developed for Internet search ranking and electoral voting into IRaPPA, a pipeline integrating biophysical properties. The approach enhances the identification of near-native structures when applied to four docking methods, resulting in a near-native appearing in the top 10 solutions for up to 50% of complexes benchmarked, and up to 70% in the top 100. AVAILABILITY AND IMPLEMENTATION: IRaPPA has been implemented in the SwarmDock server ( http://bmm.crick.ac.uk/∼SwarmDock/ ), pyDock server ( http://life.bsc.es/pid/pydockrescoring/ ) and ZDOCK server ( http://zdock.umassmed.edu/ ), with code available on request. CONTACT: moal@ebi.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Information Storage and Retrieval/methods , Molecular Docking Simulation , Protein Conformation , Protein Interaction Mapping/methods , Software , Internet
10.
Proteins ; 85(3): 528-543, 2017 03.
Article in English | MEDLINE | ID: mdl-27935158

ABSTRACT

Reliable identification of near-native poses of docked protein-protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein-protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near-native from incorrect clusters. The results show that our approach is able to identify clusters containing near-native protein-protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528-543. © 2016 Wiley Periodicals, Inc.


Subject(s)
Computational Biology/methods , Machine Learning , Molecular Docking Simulation/methods , Proteins/chemistry , Software , Benchmarking , Binding Sites , Cluster Analysis , Protein Binding , Protein Conformation , Protein Interaction Mapping , Research Design , Structural Homology, Protein , Thermodynamics
11.
Proteins ; 85(3): 487-496, 2017 03.
Article in English | MEDLINE | ID: mdl-27701776

ABSTRACT

The sixth CAPRI edition included new modeling challenges, such as the prediction of protein-peptide complexes, and the modeling of homo-oligomers and domain-domain interactions as part of the first joint CASP-CAPRI experiment. Other non-standard targets included the prediction of interfacial water positions and the modeling of the interactions between proteins and nucleic acids. We have participated in all proposed targets of this CAPRI edition both as predictors and as scorers, with new protocols to efficiently use our docking and scoring scheme pyDock in a large variety of scenarios. In addition, we have participated for the first time in the servers section, with our recently developed webserver, pyDockWeb. Excluding the CASP-CAPRI cases, we submitted acceptable models (or better) for 7 out of the 18 evaluated targets as predictors, 4 out of the 11 targets as scorers, and 6 out of the 18 targets as servers. The overall success rates were below those in past CAPRI editions. This shows the challenging nature of this last edition, with many difficult targets for which no participant submitted a single acceptable model. Interestingly, we submitted acceptable models for 83% of the evaluated protein-peptide targets. As for the 25 cases of the CASP-CAPRI experiment, in which we used a larger variety of modeling techniques (template-based, symmetry restraints, literature information, etc.), we submitted acceptable models for 56% of the targets. In summary, this CAPRI edition showed that pyDock scheme can be efficiently adapted to the increasing variety of problems that the protein interactions field is currently facing. Proteins 2017; 85:487-496. © 2016 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Computational Biology/methods , Molecular Docking Simulation/methods , Peptides/chemistry , Proteins/chemistry , Software , Amino Acid Sequence , Benchmarking , Binding Sites , Crystallography, X-Ray , Protein Binding , Protein Conformation , Protein Interaction Mapping , Protein Multimerization , Research Design , Structural Homology, Protein , Thermodynamics , Water/chemistry
12.
J Mol Biol ; 427(19): 3031-41, 2015 Sep 25.
Article in English | MEDLINE | ID: mdl-26231283

ABSTRACT

We present an updated and integrated version of our widely used protein-protein docking and binding affinity benchmarks. The benchmarks consist of non-redundant, high-quality structures of protein-protein complexes along with the unbound structures of their components. Fifty-five new complexes were added to the docking benchmark, 35 of which have experimentally measured binding affinities. These updated docking and affinity benchmarks now contain 230 and 179 entries, respectively. In particular, the number of antibody-antigen complexes has increased significantly, by 67% and 74% in the docking and affinity benchmarks, respectively. We tested previously developed docking and affinity prediction algorithms on the new cases. Considering only the top 10 docking predictions per benchmark case, a prediction accuracy of 38% is achieved on all 55 cases and up to 50% for the 32 rigid-body cases only. Predicted affinity scores are found to correlate with experimental binding energies up to r=0.52 overall and r=0.72 for the rigid complexes.


Subject(s)
Molecular Docking Simulation , Protein Interaction Mapping/methods , Proteins/metabolism , Algorithms , Animals , Humans , Polynucleotide Adenylyltransferase/chemistry , Polynucleotide Adenylyltransferase/metabolism , Protein Binding , Protein Conformation , Proteins/chemistry , Software , Thermodynamics , Vaccinia virus/chemistry , Vaccinia virus/metabolism , Viral Proteins/chemistry , Viral Proteins/metabolism
13.
Proteins ; 83(4): 640-50, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25586563

ABSTRACT

Mutations at protein-protein recognition sites alter binding strength by altering the chemical nature of the interacting surfaces. We present a simple surface energy model, parameterized with empirical ΔΔG values, yielding mean energies of -48 cal mol(-1) Å(-2) for interactions between hydrophobic surfaces, -51 to -80 cal mol(-1) Å(-2) for surfaces of complementary charge, and 66-83 cal mol(-1) Å(-2) for electrostatically repelling surfaces, relative to the aqueous phase. This places the mean energy of hydrophobic surface burial at -24 cal mol(-1) Å(-2) . Despite neglecting configurational entropy and intramolecular changes, the model correlates with empirical binding free energies of a functionally diverse set of rigid-body interactions (r = 0.66). When used to rerank docking poses, it can place near-native solutions in the top 10 for 37% of the complexes evaluated, and 82% in the top 100. The method shows that hydrophobic burial is the driving force for protein association, accounting for 50-95% of the cohesive energy. The model is available open-source from http://life.bsc.es/pid/web/surface_energy/ and via the CCharpPPI web server http://life.bsc.es/pid/ccharppi/.


Subject(s)
Mutation/physiology , Protein Binding , Proteins/chemistry , Proteins/metabolism , Hydrophobic and Hydrophilic Interactions , Molecular Docking Simulation , Static Electricity , Thermodynamics
14.
Bioinformatics ; 31(1): 123-5, 2015 Jan 01.
Article in English | MEDLINE | ID: mdl-25183488

ABSTRACT

SUMMARY: The atomic structures of protein-protein interactions are central to understanding their role in biological systems, and a wide variety of biophysical functions and potentials have been developed for their characterization and the construction of predictive models. These tools are scattered across a multitude of stand-alone programs, and are often available only as model parameters requiring reimplementation. This acts as a significant barrier to their widespread adoption. CCharPPI integrates many of these tools into a single web server. It calculates up to 108 parameters, including models of electrostatics, desolvation and hydrogen bonding, as well as interface packing and complementarity scores, empirical potentials at various resolutions, docking potentials and composite scoring functions. AVAILABILITY AND IMPLEMENTATION: The server does not require registration by the user and is freely available for non-commercial academic use at http://life.bsc.es/pid/ccharppi.


Subject(s)
Internet , Molecular Docking Simulation/methods , Multiprotein Complexes/chemistry , Protein Interaction Mapping , Software , Humans , Hydrogen Bonding , Static Electricity
15.
Mol Microbiol ; 95(1): 17-30, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25354037

ABSTRACT

σ(54)-dependent transcription controls a wide range of stress-related genes in bacteria and is tightly regulated. In contrast to σ(70), the σ(54)-RNA polymerase holoenzyme forms a stable closed complex at the promoter site that rarely isomerises into transcriptionally competent open complexes. The conversion into open complexes requires the ATPase activity of activator proteins that bind remotely upstream of the transcriptional start site. These activators belong to the large AAA protein family and the majority of them consist of an N-terminal regulatory domain, a central AAA domain and a C-terminal DNA binding domain. Here we use a functional variant of the NorR activator, a dedicated NO sensor, to provide the first structural and functional characterisation of a full length AAA activator in complex with its enhancer DNA. Our data suggest an inter-dependent and synergistic relationship of all three functional domains and provide an explanation for the dependence of NorR on enhancer DNA. Our results show that NorR readily assembles into higher order oligomers upon enhancer binding, independent of activating signals. Upon inducing signals, the N-terminal regulatory domain relocates to the periphery of the AAA ring. Together our data provide an assembly and activation mechanism for NorR.


Subject(s)
Bacteria/metabolism , RNA Polymerase Sigma 54/genetics , Trans-Activators/chemistry , Trans-Activators/genetics , Bacteria/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , DNA, Bacterial/metabolism , Models, Molecular , Molecular Docking Simulation , Nitric Oxide/metabolism , RNA Polymerase Sigma 54/metabolism , Regulatory Sequences, Nucleic Acid , Trans-Activators/metabolism
17.
Ticks Tick Borne Dis ; 5(6): 947-50, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25108785

ABSTRACT

In the next generation sequencing era we are encountering hundreds of thousands of sequences from specific organisms. Such massive data must be accurately classified both functionally and structurally. Determining appropriate sequences with a specific function from next generation sequencing, however, is a daunting experimental task. A recent salivary gland transcriptome from the hard tick Ixodes ricinus, a European disease vector, has been made publicly available. Among the protein families sequenced by the salivary gland transcriptome of I. ricinus, the Kunitz-domain is one of the highly represented protein families. Thus far, recent tick transciptomes solely classify (computationally) Kunitz sequences as putative serine protease inhibitors. We present here a novel method using a machine-learning algorithm to "fish" for candidate ion-channel effectors and loss of serine protease inhibitor function within the Kunitz-domain protein family of the I. ricinus salivary gland transcriptome. The models, data and scripts used in this work are available online from http://life.bsc.es/pid/web/imoal/kunitz-classification.html.


Subject(s)
Ion Channels/genetics , Ixodes/genetics , Transcriptome , Algorithms , Amino Acid Sequence , Animals , Arthropod Proteins/genetics , Cluster Analysis , High-Throughput Nucleotide Sequencing , Protease Inhibitors , Protein Domains , Salivary Proteins and Peptides/genetics , Sequence Analysis, DNA
18.
Proteins ; 82(4): 620-32, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24155158

ABSTRACT

We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the critical assessment of predicted interactions (CAPRI) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI Target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions-20 groups submitted a total of 195 models-were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high- or medium-quality docking models-a very good docking performance per se-only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high-quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes.


Subject(s)
Colicins/chemistry , Protein Interaction Mapping , Water/chemistry , Algorithms , Computational Biology , Models, Molecular , Molecular Docking Simulation , Protein Conformation
19.
BMC Bioinformatics ; 14: 286, 2013 Oct 01.
Article in English | MEDLINE | ID: mdl-24079540

ABSTRACT

BACKGROUND: Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. RESULTS: We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. CONCLUSIONS: All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.


Subject(s)
Computational Biology/methods , Molecular Docking Simulation/methods , Protein Binding , Proteins , Cluster Analysis , Ligands , Proteins/chemistry , Proteins/metabolism
20.
PLoS Comput Biol ; 9(9): e1003216, 2013.
Article in English | MEDLINE | ID: mdl-24039569

ABSTRACT

Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.


Subject(s)
Mutation , Proteins/chemistry , Alanine/chemistry , Artificial Intelligence , Kinetics , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...