Search | VHL Regional Portal

Assessing the performance of MM/PBSA and MM/GBSA methods. 10. Prediction reliability of binding affinities and binding poses for RNA-ligand complexes.

Jiang, Dejun; Du, Hongyan; Zhao, Huifeng; Deng, Yafeng; Wu, Zhenxing; Wang, Jike; Zeng, Yundian; Zhang, Haotian; Wang, Xiaorui; Wang, Ercheng; Hou, Tingjun; Hsieh, Chang-Yu.

Phys Chem Chem Phys ; 26(13): 10323-10335, 2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38501198

ABSTRACT

Ribonucleic acid (RNA)-ligand interactions play a pivotal role in a wide spectrum of biological processes, ranging from protein biosynthesis to cellular reproduction. This recognition has prompted the broader acceptance of RNA as a viable candidate for drug targets. Delving into the atomic-scale understanding of RNA-ligand interactions holds paramount importance in unraveling intricate molecular mechanisms and further contributing to RNA-based drug discovery. Computational approaches, particularly molecular docking, offer an efficient way of predicting the interactions between RNA and small molecules. However, the accuracy and reliability of these predictions heavily depend on the performance of scoring functions (SFs). In contrast to the majority of SFs used in RNA-ligand docking, the end-point binding free energy calculation methods, such as molecular mechanics/generalized Born surface area (MM/GBSA) and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA), stand as theoretically more rigorous approaches. Yet, the evaluation of their effectiveness in predicting both binding affinities and binding poses within RNA-ligand systems remains unexplored. This study first reported the performance of MM/PBSA and MM/GBSA with diverse solvation models, interior dielectric constants (Îµin) and force fields in the context of binding affinity prediction for 29 RNA-ligand complexes. MM/GBSA is based on short (5 ns) molecular dynamics (MD) simulations in an explicit solvent with the YIL force field; the GBGBn2 model with higher interior dielectric constant (Îµin = 12, 16 or 20) yields the best correlation (Rp = -0.513), which outperforms the best correlation (Rp = -0.317, rDock) offered by various docking programs. Then, the efficacy of MM/GBSA in identifying the near-native binding poses from the decoys was assessed based on 56 RNA-ligand complexes. However, it is evident that MM/GBSA has limitations in accurately predicting binding poses for RNA-ligand systems, particularly compared with notably proficient docking programs like rDock and PLANTS. The best top-1 success rate achieved by MM/GBSA rescoring is 39.3%, which falls below the best results given by docking programs (50%, PLNATS). This study represents the first evaluation of MM/PBSA and MM/GBSA for RNA-ligand systems and is expected to provide valuable insights into their successful application to RNA targets.

Subject(s)

Molecular Dynamics Simulation , RNA , Molecular Docking Simulation , Ligands , Reproducibility of Results , Protein Binding , Thermodynamics , Binding Sites

How Good Are Current Docking Programs at Nucleic Acid-Ligand Docking? A Comprehensive Evaluation.

Jiang, Dejun; Zhao, Huifeng; Du, Hongyan; Deng, Yafeng; Wu, Zhenxing; Wang, Jike; Zeng, Yundian; Zhang, Haotian; Wang, Xiaorui; Wu, Jian; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Theory Comput ; 19(16): 5633-5647, 2023 Aug 22.

Article in English | MEDLINE | ID: mdl-37480347

ABSTRACT

Nucleic acid (NA)-ligand interactions are of paramount importance in a variety of biological processes, including cellular reproduction and protein biosynthesis, and therefore, NAs have been broadly recognized as potential drug targets. Understanding NA-ligand interactions at the atomic scale is essential for investigating the molecular mechanism and further assisting in NA-targeted drug discovery. Molecular docking is one of the predominant computational approaches for predicting the interactions between NAs and small molecules. Despite the availability of versatile docking programs, their performance profiles for NA-ligand complexes have not been thoroughly characterized. In this study, we first compiled the largest structure-based NA-ligand binding data set to date, containing 800 noncovalent NA-ligand complexes with clearly identified ligands. Based on this extensive data set, eight frequently used docking programs, including six protein-ligand docking programs (LeDock, Surflex-Dock, UCSF Dock6, AutoDock, AutoDock Vina, and PLANTS) and two specific NA-ligand docking programs (rDock and RLDOCK), were systematically evaluated in terms of binding pose and binding affinity predictions. The results demonstrated that some protein-ligand docking programs, specifically PLANTS and LeDock, produced more promising or comparable results compared with the specialized NA-ligand docking programs. Among the programs evaluated, PLANTS, rDock, and LeDock showed the highest performance in binding pose prediction, and their top-1 and best root-mean-square deviation (rmsd) success rates were as follows: PLANTS (35.93 and 76.05%), rDock (27.25 and 72.16%), and LeDock (27.40 and 64.37%). Compared with the moderate level of binding pose prediction, few programs were successful in binding affinity prediction, and the best correlation (Rp = -0.461) was observed with PLANTS. Finally, further comparison with the latest NA-ligand docking program (NLDock) on four well-established data sets revealed that PLANTS and LeDock outperformed NLDock in terms of binding pose prediction on all data sets, demonstrating their significant potential for NA-ligand docking. To the best of our knowledge, this study is the most comprehensive evaluation of popular molecular docking programs for NA-ligand systems.

Subject(s)

Drug Discovery , Nucleic Acids , Ligands , Molecular Docking Simulation

Molecular Generation with Reduced Labeling through Constraint Architecture.

Wang, Jike; Zeng, Yundian; Sun, Huiyong; Wang, Junmei; Wang, Xiaorui; Jin, Ruofan; Wang, Mingyang; Zhang, Xujun; Cao, Dongsheng; Chen, Xi; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Inf Model ; 63(11): 3319-3327, 2023 06 12.

Article in English | MEDLINE | ID: mdl-37184885

ABSTRACT

In the past few years, a number of machine learning (ML)-based molecular generative models have been proposed for generating molecules with desirable properties, but they all require a large amount of label data of pharmacological and physicochemical properties. However, experimental determination of these labels, especially bioactivity labels, is very expensive. In this study, we analyze the dependence of various multi-property molecule generation models on biological activity label data and propose Frag-G/M, a fragment-based multi-constraint molecular generation framework based on conditional transformer, recurrent neural networks (RNNs), and reinforcement learning (RL). The experimental results illustrate that, using the same number of labels, Frag-G/M can generate more desired molecules than the baselines (several times more than the baselines). Moreover, compared with the known active compounds, the molecules generated by Frag-G/M exhibit higher scaffold diversity than those generated by the baselines, thus making it more promising to be used in real-world drug discovery scenarios.

Subject(s)

Drug Discovery , Neural Networks, Computer , Drug Discovery/methods , Machine Learning , Models, Molecular

MetalProGNet: a structure-based deep graph model for metalloprotein-ligand interaction predictions.

Jiang, Dejun; Ye, Zhaofeng; Hsieh, Chang-Yu; Yang, Ziyi; Zhang, Xujun; Kang, Yu; Du, Hongyan; Wu, Zhenxing; Wang, Jike; Zeng, Yundian; Zhang, Haotian; Wang, Xiaorui; Wang, Mingyang; Yao, Xiaojun; Zhang, Shengyu; Wu, Jian; Hou, Tingjun.

Chem Sci ; 14(8): 2054-2069, 2023 Feb 22.

Article in English | MEDLINE | ID: mdl-36845922

ABSTRACT

Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.

ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery.

Wang, Jike; Wang, Xiaorui; Sun, Huiyong; Wang, Mingyang; Zeng, Yundian; Jiang, Dejun; Wu, Zhenxing; Liu, Zeyi; Liao, Ben; Yao, Xiaojun; Hsieh, Chang-Yu; Cao, Dongsheng; Chen, Xi; Hou, Tingjun.

J Med Chem ; 65(18): 12482-12496, 2022 09 22.

Article in English | MEDLINE | ID: mdl-36065998

ABSTRACT

Many deep learning (DL)-based molecular generative models have been proposed to design novel molecules. These models may perform well on benchmarks, but they usually do not take real-world constraints into account, such as available training data set, synthetic accessibility, and scaffold diversity in drug discovery. In this study, a new algorithm, ChemistGA, was proposed by combining the traditional heuristic algorithm with DL, in which the crossover of the traditional genetic algorithm (GA) was redefined by DL in conjunction with GA, and an innovative backcrossing operation was implemented to generate desired molecules. Our results clearly show that ChemistGA not only retains the strength of the traditional GA but also greatly enhances the synthetic accessibility and success rate of the generated molecules with desired properties. Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.

Subject(s)

Algorithms , Drug Discovery , Drug Design , Models, Molecular

Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network.

Du, Hongyan; Jiang, Dejun; Gao, Junbo; Zhang, Xujun; Jiang, Lingxiao; Zeng, Yundian; Wu, Zhenxing; Shen, Chao; Xu, Lei; Cao, Dongsheng; Hou, Tingjun; Pan, Peichen.

Research (Wash D C) ; 2022: 9873564, 2022.

Article in English | MEDLINE | ID: mdl-35958111

ABSTRACT

Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.

Advantages of a 21-loci short tandem repeat method for detection of cross-contamination in human cell lines.

Gu, Meijia; Liu, Jingxuan; Yang, Meimei; Zhang, Mingmin; Yang, Jian; Duan, Suling; Ding, Xuxu; Liu, Jie; Chen, Chuguang; Zeng, Yundian; Shen, Chao.

Gene ; 763: 145048, 2020 Dec 30.

Article in English | MEDLINE | ID: mdl-32805312

ABSTRACT

Cross-contamination of cell lines is a highly relevant and pervasive problem. The analysis of short tandem repeats (STR) is a simple and commercially available technique to authenticate cell lines for more than two decades. At present, STR multiple amplification kits have been developed up to 21 loci while the current STR databases only provide 9-loci STR profiles. Here, we compared the advantages of 21-loci STR methodology using the same algorithm as 9-loci method. The 21-loci method reduced the uncertainty ratio for authentications by 97.5% relative to the 9-loci method and exclude effectively false positive. We show that the additional 12 loci helped to greatly reduce sample-site marker specificity arising from genetic isolation and the occurrence of null alleles, suggesting that inclusion of additional loci in these databases will ultimately improve the efficiency and accuracy of authentication of cell lines. Taken together, we demonstrate the utility of a 21-loci method in human cells, providing a novel marker panel for use as a valuable alternative to 9-loci analyses to minimize cell line authentication errors and reduce costs due to erroneous experiments.

Subject(s)

Cell Line Authentication/methods , Microsatellite Repeats , Cell Line , Cell Line Authentication/standards , Cell Line, Tumor , Genetic Loci , Genetic Markers , Humans , Molecular Typing/methods , Molecular Typing/standards

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL