Search | VHL Regional Portal

Improved Peptide Docking with Privileged Knowledge Distillation using Deep Learning.

Zhang, Zicong; Verburgt, Jacob; Kagaya, Yuki; Christoffer, Charles; Kihara, Daisuke.

bioRxiv ; 2023 Dec 04.

Article in English | MEDLINE | ID: mdl-38106114

ABSTRACT

Protein-peptide interactions play a key role in biological processes. Understanding the interactions that occur within a receptor-peptide complex can help in discovering and altering their biological functions. Various computational methods for modeling the structures of receptor-peptide complexes have been developed. Recently, accurate structure prediction enabled by deep learning methods has significantly advanced the field of structural biology. AlphaFold (AF) is among the top-performing structure prediction methods and has highly accurate structure modeling performance on single-chain targets. Shortly after the release of AlphaFold, AlphaFold-Multimer (AFM) was developed in a similar fashion as AF for prediction of protein complex structures. AFM has achieved competitive performance in modeling protein-peptide interactions compared to previous computational methods; however, still further improvement is needed. Here, we present DistPepFold, which improves protein-peptide complex docking using an AFM-based architecture through a privileged knowledge distillation approach. DistPepFold leverages a teacher model that uses native interaction information during training and transfers its knowledge to a student model through a teacher-student distillation process. We evaluated DistPepFold's docking performance on two protein-peptide complex datasets and showed that DistPepFold outperforms AFM. Furthermore, we demonstrate that the student model was able to learn from the teacher model to make structural improvements based on AFM predictions.

Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment.

Lensink, Marc F; Brysbaert, Guillaume; Raouraoua, Nessim; Bates, Paul A; Giulini, Marco; Honorato, Rodrigo V; van Noort, Charlotte; Teixeira, Joao M C; Bonvin, Alexandre M J J; Kong, Ren; Shi, Hang; Lu, Xufeng; Chang, Shan; Liu, Jian; Guo, Zhiye; Chen, Xiao; Morehead, Alex; Roy, Raj S; Wu, Tianqi; Giri, Nabin; Quadir, Farhan; Chen, Chen; Cheng, Jianlin; Del Carpio, Carlos A; Ichiishi, Eichiro; Rodriguez-Lumbreras, Luis A; Fernandez-Recio, Juan; Harmalkar, Ameya; Chu, Lee-Shin; Canner, Sam; Smanta, Rituparna; Gray, Jeffrey J; Li, Hao; Lin, Peicong; He, Jiahua; Tao, Huanyu; Huang, Sheng-You; Roel-Touris, Jorge; Jimenez-Garcia, Brian; Christoffer, Charles W; Jain, Anika J; Kagaya, Yuki; Kannan, Harini; Nakamura, Tsukasa; Terashi, Genki; Verburgt, Jacob C; Zhang, Yuanyuan; Zhang, Zicong; Fujuta, Hayato; Sekijima, Masakazu.

Proteins ; 91(12): 1658-1683, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37905971

ABSTRACT

We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

Subject(s)

Algorithms , Protein Interaction Mapping , Protein Interaction Mapping/methods , Protein Conformation , Protein Binding , Molecular Docking Simulation , Computational Biology/methods , Software

Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study.

Schweke, Hugo; Xu, Qifang; Tauriello, Gerardo; Pantolini, Lorenzo; Schwede, Torsten; Cazals, Frédéric; Lhéritier, Alix; Fernandez-Recio, Juan; Rodríguez-Lumbreras, Luis Angel; Schueler-Furman, Ora; Varga, Julia K; Jiménez-García, Brian; Réau, Manon F; Bonvin, Alexandre M J J; Savojardo, Castrense; Martelli, Pier-Luigi; Casadio, Rita; Tubiana, Jérôme; Wolfson, Haim J; Oliva, Romina; Barradas-Bautista, Didier; Ricciardelli, Tiziana; Cavallo, Luigi; Venclovas, Ceslovas; Olechnovic, Kliment; Guerois, Raphael; Andreani, Jessica; Martin, Juliette; Wang, Xiao; Terashi, Genki; Sarkar, Daipayan; Christoffer, Charles; Aderinwale, Tunde; Verburgt, Jacob; Kihara, Daisuke; Marchand, Anthony; Correia, Bruno E; Duan, Rui; Qiu, Liming; Xu, Xianjin; Zhang, Shuang; Zou, Xiaoqin; Dey, Sucharita; Dunbrack, Roland L; Levy, Emmanuel D; Wodak, Shoshana J.

Proteomics ; 23(17): e2200323, 2023 09.

Article in English | MEDLINE | ID: mdl-37365936

ABSTRACT

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.

Subject(s)

Proteins , Reproducibility of Results , Proteins/metabolism , Protein Binding

Multi-level analysis of intrinsically disordered protein docking methods.

Verburgt, Jacob; Zhang, Zicong; Kihara, Daisuke.

Methods ; 204: 55-63, 2022 08.

Article in English | MEDLINE | ID: mdl-35609776

ABSTRACT

Intrinsically Disordered Proteins (IDPs) are a class of proteins in which at least some region of the protein does not possess any stable structure in solution in the physiological condition but may adopt an ordered structure upon binding to a globular receptor. These IDP-receptor complexes are thus subject to protein complex modeling in which computational techniques are applied to accurately reproduce the IDP ligand-receptor interactions. This often exists in the form of protein docking, in which the 3D structures of both the subunits are known, but the position of the ligand relative to the receptor is not. Here, we evaluate the performance of three IDP-receptor modeling tools with metrics that characterize the IDP-receptor interface at various resolutions. We show that all three methods are able to properly identify the general binding site, as identified by lower resolution metrics, but begin to struggle with higher resolution metrics that capture biophysical interactions.

Subject(s)

Intrinsically Disordered Proteins , Binding Sites , Intrinsically Disordered Proteins/chemistry , Ligands , Protein Binding , Protein Conformation , Protein Domains

Benchmarking of structure refinement methods for protein complex models.

Verburgt, Jacob; Kihara, Daisuke.

Proteins ; 90(1): 83-95, 2022 01.

Article in English | MEDLINE | ID: mdl-34309909

ABSTRACT

Protein structure docking is the process in which the quaternary structure of a protein complex is predicted from individual tertiary structures of the protein subunits. Protein docking is typically performed in two main steps. The subunits are first docked while keeping them rigid to form the complex, which is then followed by structure refinement. Structure refinement is crucial for a practical use of computational protein docking models, as it is aimed for correcting conformations of interacting residues and atoms at the interface. Here, we benchmarked the performance of eight existing protein structure refinement methods in refinement of protein complex models. We show that the fraction of native contacts between subunits is by far the most straightforward metric to improve. However, backbone dependent metrics, based on the Root Mean Square Deviation proved more difficult to improve via refinement.

Subject(s)

Computational Biology/methods , Molecular Docking Simulation/methods , Protein Conformation , Proteins/chemistry , Algorithms , Benchmarking , Databases, Protein , Proteins/genetics , Proteins/metabolism

Evolutionary Dynamics of Indels in SARS-CoV-2 Spike Glycoprotein.

Rao, R Shyama Prasad; Ahsan, Nagib; Xu, Chunhui; Su, Lingtao; Verburgt, Jacob; Fornelli, Luca; Kihara, Daisuke; Xu, Dong.

Evol Bioinform Online ; 17: 11769343211064616, 2021.

Article in English | MEDLINE | ID: mdl-34898980

ABSTRACT

SARS-CoV-2, responsible for the current COVID-19 pandemic that claimed over 5.0 million lives, belongs to a class of enveloped viruses that undergo quick evolutionary adjustments under selection pressure. Numerous variants have emerged in SARS-CoV-2, posing a serious challenge to the global vaccination effort and COVID-19 management. The evolutionary dynamics of this virus are only beginning to be explored. In this work, we have analysed 1.79 million spike glycoprotein sequences of SARS-CoV-2 and found that the virus is fine-tuning the spike with numerous amino acid insertions and deletions (indels). Indels seem to have a selective advantage as the proportions of sequences with indels steadily increased over time, currently at over 89%, with similar trends across countries/variants. There were as many as 420 unique indel positions and 447 unique combinations of indels. Despite their high frequency, indels resulted in only minimal alteration of N-glycosylation sites, including both gain and loss. As indels and point mutations are positively correlated and sequences with indels have significantly more point mutations, they have implications in the evolutionary dynamics of the SARS-CoV-2 spike glycoprotein.

Mass spectrometry-based proteomic platforms for better understanding of SARS-CoV-2 induced pathogenesis and potential diagnostic approaches.

Ahsan, Nagib; Rao, R Shyama Prasad; Wilson, Rashaun S; Punyamurtula, Ujwal; Salvato, Fernanda; Petersen, Max; Ahmed, Mohammad Kabir; Abid, M Ruhul; Verburgt, Jacob C; Kihara, Daisuke; Yang, Zhibo; Fornelli, Luca; Foster, Steven B; Ramratnam, Bharat.

Proteomics ; 21(10): e2000279, 2021 05.

Article in English | MEDLINE | ID: mdl-33860983

ABSTRACT

While protein-protein interaction is the first step of the SARS-CoV-2 infection, recent comparative proteomic profiling enabled the identification of over 11,000 protein dynamics, thus providing a comprehensive reflection of the molecular mechanisms underlying the cellular system in response to viral infection. Here we summarize and rationalize the results obtained by various mass spectrometry (MS)-based proteomic approaches applied to the functional characterization of proteins and pathways associated with SARS-CoV-2-mediated infections in humans. Comparative analysis of cell-lines versus tissue samples indicates that our knowledge in proteome profile alternation in response to SARS-CoV-2 infection is still incomplete and the tissue-specific response to SARS-CoV-2 infection can probably not be recapitulated efficiently by in vitro experiments. However, regardless of the viral infection period, sample types, and experimental strategies, a thorough cross-comparison of the recently published proteome, phosphoproteome, and interactome datasets led to the identification of a common set of proteins and kinases associated with PI3K-Akt, EGFR, MAPK, Rap1, and AMPK signaling pathways. Ephrin receptor A2 (EPHA2) was identified by 11 studies including all proteomic platforms, suggesting it as a potential future target for SARS-CoV-2 infection mechanisms and the development of new therapeutic strategies. We further discuss the potentials of future proteomics strategies for identifying prognostic SARS-CoV-2 responsive age-, gender-dependent, tissue-specific protein targets.

Subject(s)

COVID-19/metabolism , Host-Pathogen Interactions , Mass Spectrometry/methods , Proteomics/methods , SARS-CoV-2/physiology , Animals , COVID-19/diagnosis , COVID-19/pathology , Humans , Protein Interaction Mapping/methods , Protein Interaction Maps , Protein Kinases/analysis , Protein Kinases/metabolism , Protein Processing, Post-Translational , Proteome/analysis , Proteome/metabolism , Receptor, EphA2/analysis , Receptor, EphA2/metabolism , Signal Transduction

Performance and enhancement of the LZerD protein assembly pipeline in CAPRI 38-46.

Christoffer, Charles; Terashi, Genki; Shin, Woong-Hee; Aderinwale, Tunde; Maddhuri Venkata Subramaniya, Sai Raghavendra; Peterson, Lenna; Verburgt, Jacob; Kihara, Daisuke.

Proteins ; 88(8): 948-961, 2020 08.

Article in English | MEDLINE | ID: mdl-31697428

ABSTRACT

We report the performance of the protein docking prediction pipeline of our group and the results for Critical Assessment of Prediction of Interactions (CAPRI) rounds 38-46. The pipeline integrates programs developed in our group as well as other existing scoring functions. The core of the pipeline is the LZerD protein-protein docking algorithm. If templates of the target complex are not found in PDB, the first step of our docking prediction pipeline is to run LZerD for a query protein pair. Meanwhile, in the case of human group prediction, we survey the literature to find information that can guide the modeling, such as protein-protein interface information. In addition to any literature information and binding residue prediction, generated docking decoys were selected by a rank aggregation of statistical scoring functions. The top 10 decoys were relaxed by a short molecular dynamics simulation before submission to remove atom clashes and improve side-chain conformations. In these CAPRI rounds, our group, particularly the LZerD server, showed robust performance. On the other hand, there are failed cases where some other groups were successful. To understand weaknesses of our pipeline, we analyzed sources of errors for failed targets. Since we noted that structure refinement is a step that needs improvement, we newly performed a comparative study of several refinement approaches. Finally, we show several examples that illustrate successful and unsuccessful cases by our group.

Subject(s)

Molecular Docking Simulation , Peptides/chemistry , Proteins/chemistry , Software , Algorithms , Amino Acid Sequence , Binding Sites , Humans , Ligands , Peptides/metabolism , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Protein Interaction Mapping , Proteins/metabolism , Research Design , Structural Homology, Protein

Enzyme intermediates captured "on the fly" by mix-and-inject serial crystallography.

Olmos, Jose L; Pandey, Suraj; Martin-Garcia, Jose M; Calvey, George; Katz, Andrea; Knoska, Juraj; Kupitz, Christopher; Hunter, Mark S; Liang, Mengning; Oberthuer, Dominik; Yefanov, Oleksandr; Wiedorn, Max; Heyman, Michael; Holl, Mark; Pande, Kanupriya; Barty, Anton; Miller, Mitchell D; Stern, Stephan; Roy-Chowdhury, Shatabdi; Coe, Jesse; Nagaratnam, Nirupa; Zook, James; Verburgt, Jacob; Norwood, Tyler; Poudyal, Ishwor; Xu, David; Koglin, Jason; Seaberg, Matthew H; Zhao, Yun; Bajt, Sasa; Grant, Thomas; Mariani, Valerio; Nelson, Garrett; Subramanian, Ganesh; Bae, Euiyoung; Fromme, Raimund; Fung, Russell; Schwander, Peter; Frank, Matthias; White, Thomas A; Weierstall, Uwe; Zatsepin, Nadia; Spence, John; Fromme, Petra; Chapman, Henry N; Pollack, Lois; Tremblay, Lee; Ourmazd, Abbas; Phillips, George N; Schmidt, Marius.

BMC Biol ; 16(1): 59, 2018 05 31.

Article in English | MEDLINE | ID: mdl-29848358

ABSTRACT

BACKGROUND: Ever since the first atomic structure of an enzyme was solved, the discovery of the mechanism and dynamics of reactions catalyzed by biomolecules has been the key goal for the understanding of the molecular processes that drive life on earth. Despite a large number of successful methods for trapping reaction intermediates, the direct observation of an ongoing reaction has been possible only in rare and exceptional cases. RESULTS: Here, we demonstrate a general method for capturing enzyme catalysis "in action" by mix-and-inject serial crystallography (MISC). Specifically, we follow the catalytic reaction of the Mycobacterium tuberculosis ß-lactamase with the third-generation antibiotic ceftriaxone by time-resolved serial femtosecond crystallography. The results reveal, in near atomic detail, antibiotic cleavage and inactivation from 30 ms to 2 s. CONCLUSIONS: MISC is a versatile and generally applicable method to investigate reactions of biological macromolecules, some of which are of immense biological significance and might be, in addition, important targets for structure-based drug design. With megahertz X-ray pulse rates expected at the Linac Coherent Light Source II and the European X-ray free-electron laser, multiple, finely spaced time delays can be collected rapidly, allowing a comprehensive description of biomolecular reactions in terms of structure and kinetics from the same set of X-ray data.

Subject(s)

Anti-Bacterial Agents/chemistry , Bacterial Proteins/chemistry , Ceftriaxone/chemistry , Crystallography, X-Ray/methods , Mycobacterium tuberculosis/enzymology , beta-Lactamases/chemistry , Bacterial Proteins/genetics , Biocatalysis , Cephalosporin Resistance/genetics , Kinetics , Lasers , Models, Molecular , Time Factors , beta-Lactamases/genetics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL