Search | VHL Regional Portal

1.

The landscape of RNA 3D structure modeling with transformer networks.

Tarafder, Sumit; Roche, Rahmatullah; Bhattacharya, Debswapna.

Biol Methods Protoc ; 9(1): bpae047, 2024.

Article in English | MEDLINE | ID: mdl-39006460

ABSTRACT

Transformers are a powerful subclass of neural networks catalyzing the development of a growing number of computational methods for RNA structure modeling. Here, we conduct an objective and empirical study of the predictive modeling accuracy of the emerging transformer-based methods for RNA structure prediction. Our study reveals multi-faceted complementarity between the methods and underscores some key aspects that affect the prediction accuracy.

2.

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.

Roche, Rahmatullah; Moussad, Bernard; Shuvo, Md Hossain; Tarafder, Sumit; Bhattacharya, Debswapna.

Nucleic Acids Res ; 52(5): e27, 2024 Mar 21.

Article in English | MEDLINE | ID: mdl-38281252

ABSTRACT

Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.

Subject(s)

Neural Networks, Computer , Nucleic Acids , Proteins , Amino Acid Sequence , Binding Sites , Nucleic Acids/chemistry , Proteins/chemistry

3.

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.

Roche, Rahmatullah; Moussad, Bernard; Shuvo, Md Hossain; Tarafder, Sumit; Bhattacharya, Debswapna.

bioRxiv ; 2023 Sep 16.

Article in English | MEDLINE | ID: mdl-37745556

ABSTRACT

Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.

4.

E(3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction.

Roche, Rahmatullah; Moussad, Bernard; Shuvo, Md Hossain; Bhattacharya, Debswapna.

PLoS Comput Biol ; 19(8): e1011435, 2023 08.

Article in English | MEDLINE | ID: mdl-37651442

ABSTRACT

Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available at https://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.

Subject(s)

Artificial Intelligence , Neural Networks, Computer , Computational Biology , Rotation , Software

5.

The transformative power of transformers in protein structure prediction.

Moussad, Bernard; Roche, Rahmatullah; Bhattacharya, Debswapna.

Proc Natl Acad Sci U S A ; 120(32): e2303499120, 2023 08 08.

Article in English | MEDLINE | ID: mdl-37523536

ABSTRACT

Transformer neural networks have revolutionized structural biology with the ability to predict protein structures at unprecedented high accuracy. Here, we report the predictive modeling performance of the state-of-the-art protein structure prediction methods built on transformers for 69 protein targets from the recently concluded 15th Critical Assessment of Structure Prediction (CASP15) challenge. Our study shows the power of transformers in protein structure modeling and highlights future areas of improvement.

Subject(s)

Electric Power Supplies , Neural Networks, Computer

6.

PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries.

Shuvo, Md Hossain; Karim, Mohimenul; Roche, Rahmatullah; Bhattacharya, Debswapna.

Bioinform Adv ; 3(1): vbad070, 2023.

Article in English | MEDLINE | ID: mdl-37351310

ABSTRACT

Motivation: Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Results: Here, we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. Availability and implementation: An open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

7.

Contact-Assisted Threading in Low-Homology Protein Modeling.

Bhattacharya, Sutanu; Roche, Rahmatullah; Shuvo, Md Hossain; Moussad, Bernard; Bhattacharya, Debswapna.

Methods Mol Biol ; 2627: 41-59, 2023.

Article in English | MEDLINE | ID: mdl-36959441

ABSTRACT

The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.

Subject(s)

Algorithms , Sequence Analysis, Protein , Sequence Analysis, Protein/methods , Proteins/chemistry , Software , Amino Acid Sequence , Databases, Protein , Protein Conformation , Protein Folding

8.

PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries.

Shuvo, Md Hossain; Karim, Mohimenul; Roche, Rahmatullah; Bhattacharya, Debswapna.

bioRxiv ; 2023 Feb 15.

Article in English | MEDLINE | ID: mdl-36824789

ABSTRACT

Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Here we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of the individual interactions between the interfacial residues using a multihead graph attention network and then probabilistically combines the estimated quality of the interfacial residues for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study reveals that the performance gains are connected to the effectiveness of the multihead graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. An open-source software implementation of PIQLE, licensed under the GNU General Public License v3, is freely available at https://github.com/Bhattacharya-Lab/PIQLE .

9.

rrQNet: Protein contact map quality estimation by deep evolutionary reconciliation.

Roche, Rahmatullah; Bhattacharya, Sutanu; Shuvo, Md Hossain; Bhattacharya, Debswapna.

Proteins ; 90(12): 2023-2034, 2022 12.

Article in English | MEDLINE | ID: mdl-35751651

ABSTRACT

Protein contact maps have proven to be a valuable tool in the deep learning revolution of protein structure prediction, ushering in the recent breakthrough by AlphaFold2. However, self-assessment of the quality of predicted structures are typically performed at the granularity of three-dimensional coordinates as opposed to directly exploiting the rotation- and translation-invariant two-dimensional (2D) contact maps. Here, we present rrQNet, a deep learning method for self-assessment in 2D by contact map quality estimation. Our approach is based on the intuition that for a contact map to be of high quality, the residue pairs predicted to be in contact should be mutually consistent with the evolutionary context of the protein. The deep neural network architecture of rrQNet implements this intuition by cascading two deep modules-one encoding the evolutionary context and the other performing evolutionary reconciliation. The penultimate stage of rrQNet estimates the quality scores at the interacting residue-pair level, which are then aggregated for estimating the quality of a contact map. This design choice offers versatility at varied resolutions from individual residue pairs to full-fledged contact maps. Trained on multiple complementary sources of contact predictors, rrQNet facilitates generalizability across various contact maps. By rigorously testing using publicly available datasets and comparing against several in-house baseline approaches, we show that rrQNet accurately reproduces the true quality score of a predicted contact map and successfully distinguishes between accurate and inaccurate contact maps predicted by a wide variety of contact predictors. The open-source rrQNet software package is freely available at https://github.com/Bhattacharya-Lab/rrQNet.

Subject(s)

Computational Biology , Proteins , Computational Biology/methods , Proteins/chemistry , Neural Networks, Computer , Software , Biological Evolution

10.

DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins.

Bhattacharya, Sutanu; Roche, Rahmatullah; Moussad, Bernard; Bhattacharya, Debswapna.

Proteins ; 90(2): 579-588, 2022 02.

Article in English | MEDLINE | ID: mdl-34599831

ABSTRACT

Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.

Subject(s)

Algorithms , Proteins/chemistry , Amino Acid Sequence , Databases, Protein , Protein Conformation , Sequence Alignment

11.

Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading.

Bhattacharya, Sutanu; Roche, Rahmatullah; Shuvo, Md Hossain; Bhattacharya, Debswapna.

Front Mol Biosci ; 8: 643752, 2021.

Article in English | MEDLINE | ID: mdl-34046429

ABSTRACT

Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.

12.

Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins.

Roche, Rahmatullah; Bhattacharya, Sutanu; Bhattacharya, Debswapna.

PLoS Comput Biol ; 17(2): e1008753, 2021 02.

Article in English | MEDLINE | ID: mdl-33621244

ABSTRACT

Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS.

Subject(s)

Computational Biology/methods , Membrane Proteins/chemistry , Solubility , Algorithms , Computer Simulation , Crystallography , Crystallography, X-Ray , Databases, Protein , Humans , Hybridization, Genetic , Image Processing, Computer-Assisted , Models, Molecular , Neural Networks, Computer , Protein Conformation , Protein Folding , Protein Structure, Secondary , Reproducibility of Results

13.

PolyFold: An interactive visual simulator for distance-based protein folding.

McGehee, Andrew J; Bhattacharya, Sutanu; Roche, Rahmatullah; Bhattacharya, Debswapna.

PLoS One ; 15(12): e0243331, 2020.

Article in English | MEDLINE | ID: mdl-33270805

ABSTRACT

Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.

Subject(s)

Algorithms , Protein Folding , Proteins/chemistry , Software , Protein Conformation

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL