Pesquisa | Portal Regional da BVS

1.

MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations.

Jung, Yong; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C; Honavar, Vasant G.

Biomolecules ; 13(1)2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36671507

RESUMO

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.

Assuntos

Aprendizado de Máquina , Proteínas , Proteínas/química , Ligação Proteica , Ligantes , Conformação Proteica

2.

DeepRank: a deep learning framework for data mining 3D protein-protein interfaces.

Renaud, Nicolas; Geng, Cunliang; Georgievska, Sonja; Ambrosetti, Francesco; Ridder, Lars; Marzella, Dario F; Réau, Manon F; Bonvin, Alexandre M J J; Xue, Li C.

Nat Commun ; 12(1): 7068, 2021 12 03.

Artigo em Inglês | MEDLINE | ID: mdl-34862392

RESUMO

Three-dimensional (3D) structures of protein complexes provide fundamental information to decipher biological processes at the molecular scale. The vast amount of experimentally and computationally resolved protein-protein interfaces (PPIs) offers the possibility of training deep learning models to aid the predictions of their biological relevance. We present here DeepRank, a general, configurable deep learning framework for data mining PPIs using 3D convolutional neural networks (CNNs). DeepRank maps features of PPIs onto 3D grids and trains a user-specified CNN on these 3D grids. DeepRank allows for efficient training of 3D CNNs with data sets containing millions of PPIs and supports both classification and regression. We demonstrate the performance of DeepRank on two distinct challenges: The classification of biological versus crystallographic PPIs, and the ranking of docking models. For both problems DeepRank is competitive with, or outperforms, state-of-the-art methods, demonstrating the versatility of the framework for research in structural biology.

Assuntos

Mineração de Dados/métodos , Aprendizado Profundo , Mapeamento de Interação de Proteínas/métodos , Cristalografia , Conjuntos de Dados como Assunto , Simulação de Acoplamento Molecular , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas

3.

iScore: An MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and support vector machines.

Renaud, Nicolas; Jung, Yong; Honavar, Vasant; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C.

SoftwareX ; 112020.

Artigo em Inglês | MEDLINE | ID: mdl-35419466

RESUMO

Computational docking is a promising tool to model three-dimensional (3D) structures of protein-protein complexes, which provides fundamental insights of protein functions in the cellular life. Singling out near-native models from the huge pool of generated docking models (referred to as the scoring problem) remains as a major challenge in computational docking. We recently published iScore, a novel graph kernel based scoring function. iScore ranks docking models based on their interface graph similarities to the training interface graph set. iScore uses a support vector machine approach with random-walk graph kernels to classify and rank protein-protein interfaces. Here, we present the software for iScore. The software provides executable scripts that fully automate the computational workflow. In addition, the creation and analysis of the interface graph can be distributed across different processes using Message Passing interface (MPI) and can be offloaded to GPUs thanks to dedicated CUDA kernels.

4.

iScore: a novel graph kernel-based function for scoring protein-protein docking models.

Geng, Cunliang; Jung, Yong; Renaud, Nicolas; Honavar, Vasant; Bonvin, Alexandre M J J; Xue, Li C.

Bioinformatics ; 36(1): 112-121, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31199455

RESUMO

MOTIVATION: Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. RESULTS: Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. AVAILABILITY AND IMPLEMENTATION: The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Biologia Computacional , Simulação de Acoplamento Molecular , Proteínas , Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Ligação Proteica , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Software

5.

An overview of data-driven HADDOCK strategies in CAPRI rounds 38-45.

Koukos, Panagiotis I; Roel-Touris, Jorge; Ambrosetti, Francesco; Geng, Cunliang; Schaarschmidt, Jörg; Trellet, Mikael E; Melquiond, Adrien S J; Xue, Li C; Honorato, Rodrigo V; Moreira, Irina; Kurkcuoglu, Zeynep; Vangone, Anna; Bonvin, Alexandre M J J.

Proteins ; 88(8): 1029-1036, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-31886559

RESUMO

Our information-driven docking approach HADDOCK has demonstrated a sustained performance since the start of its participation to CAPRI. This is due, in part, to its ability to integrate data into the modeling process, and to the robustness of its scoring function. We participated in CAPRI both as server and manual predictors. In CAPRI rounds 38-45, we have used various strategies depending on the available information. These ranged from imposing restraints to a few residues identified from literature as being important for the interaction, to binding pockets identified from homologous complexes or template-based refinement/CA-CA restraint-guided docking from identified templates. When relevant, symmetry restraints were used to limit the conformational sampling. We also tested for a large decamer target a new implementation of the MARTINI coarse-grained force field in HADDOCK. Overall, we obtained acceptable or better predictions for 13 and 11 server and manual submissions, respectively, out of the 22 interfaces. Our server performance (acceptable or higher-quality models when considering the top 10) was better (59%) than the manual (50%) one, in which we typically experiment with various combinations of protocols and data sources. Again, our simple scoring function based on a linear combination of intermolecular van der Waals and electrostatic energies and an empirical desolvation term demonstrated a good performance in the scoring experiment with a 63% success rate across all 22 interfaces. An analysis of model quality indicates that, while we are consistently performing well in generating acceptable models, there is room for improvement for generating/identifying higher quality models.

Assuntos

Simulação de Acoplamento Molecular , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Sítios de Ligação , Humanos , Ligantes , Peptídeos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Multimerização Proteica , Proteínas/metabolismo , Projetos de Pesquisa , Homologia Estrutural de Proteína , Termodinâmica

6.

Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server.

Vangone, Anna; Schaarschmidt, Joerg; Koukos, Panagiotis; Geng, Cunliang; Citro, Nevia; Trellet, Mikael E; Xue, Li C; Bonvin, Alexandre M J J.

Bioinformatics ; 35(9): 1585-1587, 2019 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-31051038

RESUMO

SUMMARY: Recently we published PROtein binDIng enerGY (PRODIGY), a web-server for the prediction of binding affinity in protein-protein complexes. By using a combination of simple structural properties, such as the residue-contacts made at the interface, PRODIGY has demonstrated a top performance compared with other state-of-the-art predictors in the literature. Here we present an extension of it, named PRODIGY-LIG, aimed at the prediction of affinity in protein-small ligand complexes. The predictive method, properly readapted for small ligand by making use of atomic instead of residue contacts, has been successfully applied for the blind prediction of 102 protein-ligand complexes during the D3R Grand Challenge 2. PRODIGY-LIG has the advantage of being simple, generic and applicable to any kind of protein-ligand complex. It provides an automatic, fast and user-friendly tool ensuring broad accessibility. AVAILABILITY AND IMPLEMENTATION: PRODIGY-LIG is freely available without registration requirements at http://milou.science.uu.nl/services/PRODIGY-LIG.

Assuntos

Computadores , Software , Sítios de Ligação , Internet , Ligantes , Ligação Proteica , Conformação Proteica

7.

iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations.

Geng, Cunliang; Vangone, Anna; Folkers, Gert E; Xue, Li C; Bonvin, Alexandre M J J.

Proteins ; 87(2): 110-119, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30417935

RESUMO

Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning-based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy-based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein-protein complexes. It competes with existing state-of-the-art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2-p53 complex.

Assuntos

Biologia Computacional/métodos , Aprendizado de Máquina , Mutação , Proteínas/genética , Ligação Competitiva , Evolução Molecular , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Proteínas Proto-Oncogênicas c-mdm2/química , Proteínas Proto-Oncogênicas c-mdm2/genética , Proteínas Proto-Oncogênicas c-mdm2/metabolismo , Termodinâmica , Proteína Supressora de Tumor p53/química , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo

8.

Performance of HADDOCK and a simple contact-based protein-ligand binding affinity predictor in the D3R Grand Challenge 2.

Kurkcuoglu, Zeynep; Koukos, Panagiotis I; Citro, Nevia; Trellet, Mikael E; Rodrigues, J P G L M; Moreira, Irina S; Roel-Touris, Jorge; Melquiond, Adrien S J; Geng, Cunliang; Schaarschmidt, Jörg; Xue, Li C; Vangone, Anna; Bonvin, A M J J.

J Comput Aided Mol Des ; 32(1): 175-185, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-28831657

RESUMO

We present the performance of HADDOCK, our information-driven docking software, in the second edition of the D3R Grand Challenge. In this blind experiment, participants were requested to predict the structures and binding affinities of complexes between the Farnesoid X nuclear receptor and 102 different ligands. The models obtained in Stage1 with HADDOCK and ligand-specific protocol show an average ligand RMSD of 5.1 Å from the crystal structure. Only 6/35 targets were within 2.5 Å RMSD from the reference, which prompted us to investigate the limiting factors and revise our protocol for Stage2. The choice of the receptor conformation appeared to have the strongest influence on the results. Our Stage2 models were of higher quality (13 out of 35 were within 2.5 Å), with an average RMSD of 4.1 Å. The docking protocol was applied to all 102 ligands to generate poses for binding affinity prediction. We developed a modified version of our contact-based binding affinity predictor PRODIGY, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy. This simple structure-based binding affinity predictor shows a Kendall's Tau correlation of 0.37 in ranking the ligands (7th best out of 77 methods, 5th/25 groups). Those results were obtained from the average prediction over the top10 poses, irrespective of their similarity/correctness, underscoring the robustness of our simple predictor. This results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.

Assuntos

Descoberta de Drogas , Simulação de Acoplamento Molecular , Receptores Citoplasmáticos e Nucleares/metabolismo , Software , Sítios de Ligação , Desenho Assistido por Computador , Cristalografia por Raios X , Desenho de Fármacos , Humanos , Ligantes , Ligação Proteica , Conformação Proteica , Receptores Citoplasmáticos e Nucleares/agonistas , Receptores Citoplasmáticos e Nucleares/antagonistas & inibidores , Receptores Citoplasmáticos e Nucleares/química , Termodinâmica

9.

Information-Driven, Ensemble Flexible Peptide Docking Using HADDOCK.

Geng, Cunliang; Narasimhan, Siddarth; Rodrigues, João P G L M; Bonvin, Alexandre M J J.

Methods Mol Biol ; 1561: 109-138, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28236236

RESUMO

Modeling protein-peptide interactions remains a significant challenge for docking programs due to the inherent highly flexible nature of peptides, which often adopt different conformations whether in their free or bound forms. We present here a protocol consisting of a hybrid approach, combining the most frequently found peptide conformations in complexes with representative conformations taken from molecular dynamics simulations of the free peptide. This approach intends to broaden the range of conformations sampled during docking. The resulting ensemble of conformations is used as a starting point for information-driven flexible docking with HADDOCK. We demonstrate the performance of this protocol on six cases of increasing difficulty, taken from a protein-peptide benchmark set. In each case, we use knowledge of the binding site on the receptor to drive the docking process. In the majority of cases where MD conformations are added to the starting ensemble for docking, we observe an improvement in the quality of the resulting models.

Assuntos

Bases de Dados de Proteínas , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/metabolismo , Proteínas/metabolismo , Software , Sítios de Ligação , Humanos , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Proteínas/química , Navegador

10.

Exploring the interplay between experimental methods and the performance of predictors of binding affinity change upon mutations in protein complexes.

Geng, Cunliang; Vangone, Anna; Bonvin, Alexandre M J J.

Protein Eng Des Sel ; 29(8): 291-299, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-27284087

RESUMO

Reliable prediction of binding affinity changes (ΔΔG) upon mutations in protein complexes relies not only on the performance of computational methods but also on the availability and quality of experimental data. Binding affinity changes can be measured by various experimental methods with different accuracies and limitations. To understand the impact of these on the prediction of binding affinity change, we present the Database of binding Affinity Change Upon Mutation (DACUM), a database of 1872 binding affinity changes upon single-point mutations, a subset of the SKEMPI database (Moal,I.H. and Fernández-Recio,J. Bioinformatics, 2012;28:2600-2607) extended with information on the experimental methods used for ΔΔG measurements. The ΔΔG data were classified into different data sets based on the experimental method used and the position of the mutation (interface and non-interface). We tested the prediction performance of the original HADDOCK score, a newly trained version of it and mutation Cutoff Scanning Matrix (Pires,D.E.V., Ascher,D.B. and Blundell,T.L. Bioinformatics 2014;30:335-342), one of the best reported ΔΔG predictors so far, on these various data sets. Our results demonstrate a strong impact of the experimental methods on the performance of binding affinity change predictors for protein complexes. This underscores the importance of properly considering and carefully choosing experimental methods in the development of novel binding affinity change predictors. The DACUM database is available online at https://github.com/haddocking/DACUM.

Assuntos

Biologia Computacional/métodos , Mutação , Proteínas/genética , Proteínas/metabolismo , Ligação Proteica

11.

Purification, cloning, characterization and essential amino acid residues analysis of a new Î¹-carrageenase from Cellulophaga sp. QY3.

Ma, Su; Duan, Gaofei; Chai, Wengang; Geng, Cunliang; Tan, Yulong; Wang, Lushan; Le Sourd, Frédéric; Michel, Gurvan; Yu, Wengong; Han, Feng.

PLoS One ; 8(5): e64666, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23741363

RESUMO

Î¹-Carrageenases belong to family 82 of glycoside hydrolases that degrade sulfated galactans in the red algae known as Î¹-carrageenans. The catalytic mechanism and some substrate-binding residues of family GH82 have been studied but the substrate recognition and binding mechanism of this family have not been fully elucidated. We report here the purification, cloning and characterization of a new Î¹-carrageenase CgiA_Ce from the marine bacterium Cellulophaga sp. QY3. CgiA_Ce was the most thermostable carrageenase described so far. It was most active at 50°C and pH 7.0 and retained more than 70% of the original activity after incubation at 50°C for 1 h at pH 7.0 or at pH 5.0-10.6 for 24 h. CgiA_Ce was an endo-type Î¹-carrageenase; it cleaved Î¹-carrageenan yielding neo-Î¹-carrabiose and neo-Î¹-carratetraose as the main end products, and neo-Î¹-carrahexaose was the minimum substrate. Sequence analysis and structure modeling showed that CgiA_Ce is indeed a new member of family GH82. Moreover, sequence analysis of Î¹-carrageenases revealed that the amino acid residues at subsites -1 and +1 were more conserved than those at other subsites. Site-directed mutagenesis followed by kinetic analysis identified three strictly conserved residues at subsites -1 and +1 of Î¹-carrageenases, G228, Y229 and R254 in CgiA_Ce, which played important roles for substrate binding. Furthermore, our results suggested that Y229 and R254 in CgiA_Ce interacted specifically with the sulfate groups of the sugar moieties located at subsites -1 and +1, shedding light on the mechanism of Î¹-carrageenan recognition in the family GH82.

Assuntos

Proteínas de Algas/química , Carragenina/metabolismo , Glicosídeo Hidrolases/química , Rodófitas/enzimologia , Proteínas de Algas/genética , Proteínas de Algas/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Glicosídeo Hidrolases/genética , Glicosídeo Hidrolases/metabolismo , Temperatura Alta , Cinética , Modelos Moleculares , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Ligação Proteica , Rodófitas/química , Especificidade por Substrato

12.

N-glycoform diversity of cellobiohydrolase I from Penicillium decumbens and synergism of nonhydrolytic glycoform in cellulose degradation.

Gao, Le; Gao, Feng; Wang, Lushan; Geng, Cunliang; Chi, Lianli; Zhao, Jian; Qu, Yinbo.

J Biol Chem ; 287(19): 15906-15, 2012 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-22427663

RESUMO

Four cellobiohydrolase I (CBHI) glycoforms, namely, CBHI-A, CBHI-B, CBHI-C, and CBHI-D, were purified from the cultured broth of Penicillium decumbens JU-A10. All glycoforms had the same amino acid sequence but displayed different characteristics and biological functions. The effects of the N-glycans of the glycoforms on CBH activity were analyzed using mass spectrum data. Longer N-glycan chains at the Asn-137 of CBHI increased CBH activity. After the N-glycans were removed using site-directed mutagenesis and homologous expression in P. decumbens, the specific CBH activity of the recombinant CBHI without N-glycosylation increased by 65% compared with the wild-type CBHI with the highest specific activity. However, the activity was not stable. Only the N-glycosylation at Asn-137 can improve CBH activity by 40%. rCBHI with N-glycosylation only at Asn-470 exhibited no enzymatic activity. CBH activity was affected whether or not the protein was glycosylated, together with the N-glycosylation site and N-glycan structure. N-Glycosylation not only affects CBH activity but may also bring a new feature to a nonhydrolytic CBHI glycoform (CBHI-A). By supplementing CBHI-A to different commercial cellulase preparations, the glucose yield of lignocellulose hydrolysis increased by >20%. After treatment with a low dose (5 mg/g substrate) of CBHI-A at 50 °C for 7 days, the hydrogen-bond intensity and crystalline degree of cotton fibers decreased by 17 and 34%, respectively. These results may provide new guidelines for cellulase engineering.

Assuntos

Celulose 1,4-beta-Celobiosidase/metabolismo , Celulose/metabolismo , Proteínas Fúngicas/metabolismo , Penicillium/enzimologia , Sequência de Aminoácidos , Asparagina/química , Asparagina/genética , Asparagina/metabolismo , Sítios de Ligação/genética , Biocatálise , Celulose/química , Celulose 1,4-beta-Celobiosidase/química , Celulose 1,4-beta-Celobiosidase/genética , Fibra de Algodão , Eletroforese em Gel de Poliacrilamida , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Glicosilação , Concentração de Íons de Hidrogênio , Hidrólise , Isoenzimas/genética , Isoenzimas/metabolismo , Cinética , Modelos Moleculares , Dados de Sequência Molecular , Mutação , Penicillium/genética , Polissacarídeos/química , Polissacarídeos/metabolismo , Estrutura Terciária de Proteína , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectroscopia de Infravermelho com Transformada de Fourier , Temperatura , Difração de Raios X

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA