Search | VHL Regional Portal

1.

GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.

Li, Bin; Ming, Dengming.

BMC Bioinformatics ; 25(1): 204, 2024 Jun 01.

Article in English | MEDLINE | ID: mdl-38824535

ABSTRACT

BACKGROUND: Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS: In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS: GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .

Subject(s)

Proteins , Solubility , Proteins/chemistry , Proteins/metabolism , Protein Conformation , Databases, Protein , Computational Biology/methods , Software , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/chemistry , Algorithms , Models, Molecular , Amino Acid Sequence

2.

Disentangling Glycan-Protein Interactions: Nuclear Magnetic Resonance (NMR) to the Rescue.

Bertuzzi, Sara; Poveda, Ana; Ardá, Ana; Gimeno, Ana; Jiménez-Barbero, Jesús.

J Vis Exp ; (207)2024 May 17.

Article in English | MEDLINE | ID: mdl-38829120

ABSTRACT

The interactions of glycans with proteins modulate many events related to health and disease. In fact, the establishment of these recognition events and their biological consequences are intimately related to the three-dimensional structures of both partners, as well as to their dynamic features and their presentation on the corresponding cell compartments. NMR techniques are unique to disentangle these characteristics and, indeed, diverse NMR-based methodologies have been developed and applied to monitor the binding events of glycans with their associate receptors. This protocol outlines the procedures to acquire, process and analyze two of the most powerful NMR methodologies employed in the NMR-glycobiology field, 1H-Saturation transfer difference (STD) and 1H,15N-Heteronuclear single quantum coherence (HSQC) titration experiments, which complementarily offer information from the glycan and protein perspective, respectively. Indeed, when combined they offer a powerful toolkit for elucidating both the structural and dynamic aspects of molecular recognition processes. This comprehensive approach enhances our understanding of glycan-protein interactions and contributes to advancing research in the chemical glycobiology field.

Subject(s)

Polysaccharides , Polysaccharides/chemistry , Polysaccharides/metabolism , Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Proteins/metabolism

3.

Knockdown of HE4 suppresses tumor growth and invasiveness in lung adenocarcinoma through regulation of EGFR signaling.

Zhang, Yue; Yang, Wenyu; Han, Xiaowang; Qiao, Yue; Wang, Haitao; Chen, Ting; Li, Tianying; Ou, Wen-Bin.

Oncol Res ; 32(6): 1119-1128, 2024.

Article in English | MEDLINE | ID: mdl-38827327

ABSTRACT

It has been shown that the high expression of human epididymis protein 4 (HE4) in most lung cancers is related to the poor prognosis of patients, but the mechanism of pathological transformation of HE4 in lung cancer is still unclear. The current study is expected to clarify the function and mechanism of HE4 in the occurrence and metastasis of lung adenocarcinoma (LUAD). Immunoblotting evaluated HE4 expression in lung cancer cell lines and biopsies, and through analysis of The Cancer Genome Atlas (TCGA) dataset. Frequent HE4 overexpression was demonstrated in LUAD, but not in lung squamous cell carcinoma (LUSC), indicating that HE4 can serve as a biomarker to distinguish between LUAD and LUSC. HE4 knockdown significantly inhibited cell growth, colony formation, wound healing, and invasion, and blocked the G1-phase of the cell cycle in LUAD cell lines through inactivation of the EGFR signaling downstream including PI3K/AKT/mTOR and RAF/MAPK pathways. The first-line EGFR inhibitor gefitinib and HE4 shRNA had no synergistic inhibitory effect on the growth of lung adenocarcinoma cells, while the third-line EGFR inhibitor osimertinib showed additive anti-proliferative effects. Moreover, we provided evidence that HE4 regulated EGFR expression by transcription regulation and protein interaction in LUAD. Our findings suggest that HE4 positively modulates the EGFR signaling pathway to promote growth and invasiveness in LUAD and highlight that targeting HE4 could be a novel strategy for LUAD treatment.

Subject(s)

Adenocarcinoma of Lung , Cell Proliferation , ErbB Receptors , Lung Neoplasms , Neoplasm Invasiveness , Signal Transduction , WAP Four-Disulfide Core Domain Protein 2 , Humans , ErbB Receptors/metabolism , ErbB Receptors/genetics , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/metabolism , WAP Four-Disulfide Core Domain Protein 2/metabolism , Lung Neoplasms/pathology , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Cell Line, Tumor , Gene Knockdown Techniques , Animals , Mice , Gene Expression Regulation, Neoplastic , Cell Movement/genetics , Proteins/metabolism , Proteins/genetics

4.

SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.

Zhang, Bin; Hou, Zilong; Yang, Yuning; Wong, Ka-Chun; Zhu, Haoran; Li, Xiangtao.

Commun Biol ; 7(1): 679, 2024 Jun 03.

Article in English | MEDLINE | ID: mdl-38830995

ABSTRACT

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .

Subject(s)

Deep Learning , Binding Sites , Nucleic Acids/metabolism , Nucleic Acids/chemistry , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Protein Binding , Computational Biology/methods

5.

High-throughput tagging of endogenous loci for rapid characterization of protein function.

Kim, Joonwon; Kratz, Alexander F; Chen, Shiye; Sheng, Jenny; Kim, Hark Kyun; Zhang, Liudeng; Singh, Brijesh Kumar; Chavez, Alejandro.

Sci Adv ; 10(18): eadg8771, 2024 May 03.

Article in English | MEDLINE | ID: mdl-38691600

ABSTRACT

To facilitate the interrogation of protein function at scale, we have developed high-throughput insertion of tags across the genome (HITAG). HITAG enables users to rapidly produce libraries of cells, each with a different protein of interest C-terminally tagged. HITAG is based on a modified strategy for performing Cas9-based targeted insertions, coupled with an improved approach for selecting properly tagged lines. Analysis of the resulting clones generated by HITAG reveals high tagging specificity, with most successful tagging events being indel free. Using HITAG, we fuse mCherry to a set of 167 stress granule-associated proteins and elucidate the features that drive a subset of proteins to strongly accumulate within these transient RNA-protein granules.

Subject(s)

Genetic Loci , Humans , CRISPR-Cas Systems , Proteins/genetics , Proteins/metabolism , High-Throughput Screening Assays/methods , Cytoplasmic Granules/metabolism , Cytoplasmic Granules/genetics

6.

Polyphosphate attachment to lysine repeats is a non-covalent protein modification.

Neville, Nolan; Lehotsky, Kirsten; Klupt, Kody A; Downey, Michael; Jia, Zongchao.

Mol Cell ; 84(9): 1802-1810.e4, 2024 May 02.

Article in English | MEDLINE | ID: mdl-38701741

ABSTRACT

Polyphosphate (polyP) is a chain of inorganic phosphate that is present in all domains of life and affects diverse cellular phenomena, ranging from blood clotting to cancer. A study by Azevedo et al. described a protein modification whereby polyP is attached to lysine residues within polyacidic serine and lysine (PASK) motifs via what the authors claimed to be covalent phosphoramidate bonding. This was based largely on the remarkable ability of the modification to survive extreme denaturing conditions. Our study demonstrates that lysine polyphosphorylation is non-covalent, based on its sensitivity to ionic strength and lysine protonation and absence of phosphoramidate bond formation, as analyzed via 31P NMR. Ionic interaction with lysine residues alone is sufficient for polyP modification, and we present a new list of non-PASK lysine repeat proteins that undergo polyP modification. This work clarifies the biochemistry of polyP-lysine modification, with important implications for both studying and modulating this phenomenon. This Matters Arising paper is in response to Azevedo et al. (2015), published in Molecular Cell. See also the Matters Arising Response by Azevedo et al. (2024), published in this issue.

Subject(s)

Amides , Lysine , Phosphoric Acids , Polyphosphates , Lysine/metabolism , Lysine/chemistry , Polyphosphates/chemistry , Polyphosphates/metabolism , Phosphorylation , Humans , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/metabolism , Proteins/genetics

7.

Protein function prediction through multi-view multi-label latent tensor reconstruction.

Armah-Sekum, Robert Ebo; Szedmak, Sandor; Rousu, Juho.

BMC Bioinformatics ; 25(1): 174, 2024 May 02.

Article in English | MEDLINE | ID: mdl-38698340

ABSTRACT

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .

Subject(s)

Computational Biology , Proteins , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Databases, Protein , Algorithms

8.

DeepSS2GO: protein function prediction from secondary structure.

Song, Fu V; Su, Jiaqi; Huang, Sixing; Zhang, Neng; Li, Kaiyue; Ni, Ming; Liao, Maofu.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38701416

ABSTRACT

Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.

Subject(s)

Algorithms , Computational Biology , Neural Networks, Computer , Protein Structure, Secondary , Proteins , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Computational Biology/methods , Databases, Protein , Gene Ontology , Sequence Analysis, Protein/methods , Software

9.

Treatment of flexibility of protein backbone in simulations of protein-ligand interactions using steered molecular dynamics.

Truong, Duc Toan; Ho, Kiet; Pham, Dinh Quoc Huy; Chwastyk, Mateusz; Nguyen-Minh, Thai; Nguyen, Minh Tho.

Sci Rep ; 14(1): 10475, 2024 05 07.

Article in English | MEDLINE | ID: mdl-38714683

ABSTRACT

To ensure that an external force can break the interaction between a protein and a ligand, the steered molecular dynamics simulation requires a harmonic restrained potential applied to the protein backbone. A usual practice is that all or a certain number of protein's heavy atoms or Cα atoms are fixed, being restrained by a small force. This present study reveals that while fixing both either all heavy atoms and or all Cα atoms is not a good approach, while fixing a too small number of few atoms sometimes cannot prevent the protein from rotating under the influence of the bulk water layer, and the pulled molecule may smack into the wall of the active site. We found that restraining the Cα atoms under certain conditions is more relevant. Thus, we would propose an alternative solution in which only the Cα atoms of the protein at a distance larger than 1.2 nm from the ligand are restrained. A more flexible, but not too flexible, protein will be expected to lead to a more natural release of the ligand.

Subject(s)

Molecular Dynamics Simulation , Protein Binding , Proteins , Ligands , Proteins/chemistry , Proteins/metabolism , Protein Conformation

10.

GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking.

Guo, Linyuan; Wang, Jianxin.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38706316

ABSTRACT

Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.

Subject(s)

Molecular Docking Simulation , Proteins , Ligands , Proteins/chemistry , Proteins/metabolism , Protein Binding , Software , Algorithms , Computational Biology/methods , Protein Conformation , Databases, Protein , Deep Learning

11.

Integrating Explicit and Implicit Fullerene Models into UNRES Force Field for Protein Interaction Studies.

Rogoza, Natalia H; Krupa, Magdalena A; Krupa, Pawel; Sieradzan, Adam K.

Molecules ; 29(9)2024 Apr 23.

Article in English | MEDLINE | ID: mdl-38731411

ABSTRACT

Fullerenes, particularly C60, exhibit unique properties that make them promising candidates for various applications, including drug delivery and nanomedicine. However, their interactions with biomolecules, especially proteins, remain not fully understood. This study implements both explicit and implicit C60 models into the UNRES coarse-grained force field, enabling the investigation of fullerene-protein interactions without the need for restraints to stabilize protein structures. The UNRES force field offers computational efficiency, allowing for longer timescale simulations while maintaining accuracy. Five model proteins were studied: FK506 binding protein, HIV-1 protease, intestinal fatty acid binding protein, PCB-binding protein, and hen egg-white lysozyme. Molecular dynamics simulations were performed with and without C60 to assess protein stability and investigate the impact of fullerene interactions. Analysis of contact probabilities reveals distinct interaction patterns for each protein. FK506 binding protein (1FKF) shows specific binding sites, while intestinal fatty acid binding protein (1ICN) and uteroglobin (1UTR) exhibit more generalized interactions. The explicit C60 model shows good agreement with all-atom simulations in predicting protein flexibility, the position of C60 in the binding pocket, and the estimation of effective binding energies. The integration of explicit and implicit C60 models into the UNRES force field, coupled with recent advances in coarse-grained modeling and multiscale approaches, provides a powerful framework for investigating protein-nanoparticle interactions at biologically relevant scales without the need to use restraints stabilizing the protein, thus allowing for large conformational changes to occur. These computational tools, in synergy with experimental techniques, can aid in understanding the mechanisms and consequences of nanoparticle-biomolecule interactions, guiding the design of nanomaterials for biomedical applications.

Subject(s)

Fullerenes , Molecular Dynamics Simulation , Muramidase , Protein Binding , Fullerenes/chemistry , Muramidase/chemistry , Muramidase/metabolism , Binding Sites , Tacrolimus Binding Proteins/chemistry , Tacrolimus Binding Proteins/metabolism , Fatty Acid-Binding Proteins/chemistry , Fatty Acid-Binding Proteins/metabolism , Proteins/chemistry , Proteins/metabolism , HIV Protease

12.

A Guide to the Quantitation of Protein Secretion Dynamics at the Single-Cell Level.

Aymerich, Nathan; Bucheli, Olivia T M; Portmann, Kevin; Eyer, Klaus; Baudry, Jean.

Methods Mol Biol ; 2804: 141-162, 2024.

Article in English | MEDLINE | ID: mdl-38753146

ABSTRACT

Protein secretion is a key cellular functionality, particularly in immunology, where cells can display large heterogeneity in this crucial activity in addition to binary secretion behavior. However, few methods enable quantitative secretion rate measurements at the single-cell level, and these methods are mostly based on microfluidics systems. Here, we describe such a microfluidic single-cell method for precisely measuring protein secretion rates in detail, building on the published droplet-based microfluidic platform DropMap. We give an updated, detailed guide toward quantifying protein secretion rates, discussing its setup and limitations. We illustrate the protocol on two key immunological analytes, immunoglobulin G, and interferon-Î³.

Subject(s)

Interferon-gamma , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Interferon-gamma/metabolism , Immunoglobulin G/metabolism , Proteins/metabolism , Microfluidic Analytical Techniques/methods , Microfluidic Analytical Techniques/instrumentation , Microfluidics/methods , Microfluidics/instrumentation

13.

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.

Jia, Pengzhen; Zhang, Fuhao; Wu, Chaojin; Li, Min.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38739759

ABSTRACT

Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.

Subject(s)

Computational Biology , Nucleic Acids , Proteins , Nucleic Acids/metabolism , Nucleic Acids/chemistry , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Ligands , Protein Binding , Humans

14.

GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs.

Zeng, Xin; Meng, Fan-Fang; Wen, Meng-Liang; Li, Shu-Juan; Li, Yi.

BMC Genomics ; 25(1): 406, 2024 May 09.

Article in English | MEDLINE | ID: mdl-38724906

ABSTRACT

Most proteins exert their functions by interacting with other proteins, making the identification of protein-protein interactions (PPI) crucial for understanding biological activities, pathological mechanisms, and clinical therapies. Developing effective and reliable computational methods for predicting PPI can significantly reduce the time-consuming and labor-intensive associated traditional biological experiments. However, accurately identifying the specific categories of protein-protein interactions and improving the prediction accuracy of the computational methods remain dual challenges. To tackle these challenges, we proposed a novel graph neural network method called GNNGL-PPI for multi-category prediction of PPI based on global graphs and local subgraphs. GNNGL-PPI consisted of two main components: using Graph Isomorphism Network (GIN) to extract global graph features from PPI network graph, and employing GIN As Kernel (GIN-AK) to extract local subgraph features from the subgraphs of protein vertices. Additionally, considering the imbalanced distribution of samples in each category within the benchmark datasets, we introduced an Asymmetric Loss (ASL) function to further enhance the predictive performance of the method. Through evaluations on six benchmark test sets formed by three different dataset partitioning algorithms (Random, BFS, DFS), GNNGL-PPI outperformed the state-of-the-art multi-category prediction methods of PPI, as measured by the comprehensive performance evaluation metric F1-measure. Furthermore, interpretability analysis confirmed the effectiveness of GNNGL-PPI as a reliable multi-category prediction method for predicting protein-protein interactions.

Subject(s)

Algorithms , Computational Biology , Neural Networks, Computer , Protein Interaction Mapping , Protein Interaction Mapping/methods , Computational Biology/methods , Protein Interaction Maps , Humans , Proteins/metabolism

15.

TransPTM: a transformer-based model for non-histone acetylation site prediction.

Meng, Lingkuan; Chen, Xingjian; Cheng, Ke; Chen, Nanjun; Zheng, Zetian; Wang, Fuzhou; Sun, Hongyan; Wong, Ka-Chun.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38725156

ABSTRACT

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.

Subject(s)

Neural Networks, Computer , Protein Processing, Post-Translational , Acetylation , Computational Biology/methods , Databases, Protein , Software , Algorithms , Humans , Proteins/chemistry , Proteins/metabolism

16.

GlycoID Proximity Labeling to Identify O-GlcNAcylated Protein Interactomes in Live Cells.

Nelson, Zachary M; Kadiri, Oseni; Fehl, Charlie.

Curr Protoc ; 4(5): e1052, 2024 May.

Article in English | MEDLINE | ID: mdl-38752278

ABSTRACT

Cells continuously remodel their intracellular proteins with the monosaccharide O-linked N-acetylglucosamine (O-GlcNAc) to regulate metabolism, signaling, and stress. This protocol describes the use of GlycoID tools to capture O-GlcNAc dynamics in live cells. GlycoID constructs contain an O-GlcNAc binding domain linked to a proximity labeling domain and a subcellular localization sequence. When expressed in mammalian cells, GlycoID tracks changes in O-GlcNAc-modified proteins and their interactomes in response to chemical induction with biotin over time. Pairing the subcellular localization of GlycoID with the chemical induction of activity enables spatiotemporal studies of O-GlcNAc biology during cellular events such as insulin signaling. However, optimizing intracellular labeling experiments requires attention to several variables. Here, we describe two protocols to adapt GlycoID methods to a cell line and biological process of interest. Next, we describe how to conduct a semiquantitative proteomic analysis of O-GlcNAcylated proteins and their interactomes using insulin versus glucagon signaling as a sample application. This articles aims to establish baseline GlycoID protocols for new users and set the stage for widespread use over diverse cellular applications for the functional study of O-GlcNAc glycobiology. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Expression of targeted GlycoID constructs to verify subcellular location and labeling activity in mammalian cells Basic Protocol 2: GlycoID labeling in live HeLa cells for O-GlcNAc proteomic comparisons.

Subject(s)

Acetylglucosamine , Humans , Acetylglucosamine/metabolism , Proteomics/methods , Insulin/metabolism , Animals , Staining and Labeling/methods , Signal Transduction , Proteins/metabolism , HeLa Cells

17.

Correlation of protein binding pocket properties with hits' chemistries used in generation of ultra-large virtual libraries.

Song, Robert X; Nicklaus, Marc C; Tarasova, Nadya I.

J Comput Aided Mol Des ; 38(1): 22, 2024 May 16.

Article in English | MEDLINE | ID: mdl-38753096

ABSTRACT

Although the size of virtual libraries of synthesizable compounds is growing rapidly, we are still enumerating only tiny fractions of the drug-like chemical universe. Our capability to mine these newly generated libraries also lags their growth. That is why fragment-based approaches that utilize on-demand virtual combinatorial libraries are gaining popularity in drug discovery. These à la carte libraries utilize synthetic blocks found to be effective binders in parts of target protein pockets and a variety of reliable chemistries to connect them. There is, however, no data on the potential impact of the chemistries used for making on-demand libraries on the hit rates during virtual screening. There are also no rules to guide in the selection of these synthetic methods for production of custom libraries. We have used the SAVI (Synthetically Accessible Virtual Inventory) library, constructed using 53 reliable reaction types (transforms), to evaluate the impact of these chemistries on docking hit rates for 40 well-characterized protein pockets. The data shows that the virtual hit rates differ significantly for different chemistries with cross coupling reactions such as Sonogashira, Suzuki-Miyaura, Hiyama and Liebeskind-Srogl coupling producing the highest hit rates. Virtual hit rates appear to depend not only on the property of the formed chemical bond but also on the diversity of available building blocks and the scope of the reaction. The data identifies reactions that deserve wider use through increasing the number of corresponding building blocks and suggests the reactions that are more effective for pockets with certain physical and hydrogen bond-forming properties.

Subject(s)

Molecular Docking Simulation , Protein Binding , Proteins , Small Molecule Libraries , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology , Proteins/chemistry , Proteins/metabolism , Binding Sites , Drug Discovery/methods , Ligands , Drug Design , Humans

18.

Robust genetic codes enhance protein evolvability.

Rozhonová, Hana; Martí-Gómez, Carlos; McCandlish, David M; Payne, Joshua L.

PLoS Biol ; 22(5): e3002594, 2024 May.

Article in English | MEDLINE | ID: mdl-38754362

ABSTRACT

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.

Subject(s)

Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Mutation/genetics , Codon/genetics , Models, Genetic , Synthetic Biology/methods , Protein Biosynthesis , Protein Engineering/methods

19.

Mutual annotation-based prediction of protein domain functions with Domain2GO.

Ulusoy, Erva; Dogan, Tunca.

Protein Sci ; 33(6): e4988, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38757367

ABSTRACT

Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.

Subject(s)

Molecular Sequence Annotation , Protein Domains , Proteins , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Databases, Protein , Computational Biology/methods , Gene Ontology , Humans , Software

20.

Genetic code robustness and protein evolvability are correlated and protein-specific.

Metzger, Brian P H.

PLoS Biol ; 22(5): e3002627, 2024 May.

Article in English | MEDLINE | ID: mdl-38758732

ABSTRACT

The relationship between genetic code robustness and protein evolvability is unknown. A new study in PLOS Biology using in silico rewiring of genetic codes and functional protein data identified a positive correlation between code robustness and protein evolvability that is protein-specific.

Subject(s)

Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Models, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL