Search | VHL Regional Portal

1.

A Multi-view Molecular Pre-training with Generative Contrastive Learning.

Liu, Yunwu; Zhang, Ruisheng; Yuan, Yongna; Ma, Jun; Li, Tongfeng; Yu, Zhixuan.

Interdiscip Sci ; 2024 May 06.

Article in English | MEDLINE | ID: mdl-38710957

ABSTRACT

Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.

2.

GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.

Ma, Jun; Zhao, Zhili; Li, Tongfeng; Liu, Yunwu; Ma, Jun; Zhang, Ruisheng.

Interdiscip Sci ; 2024 Mar 08.

Article in English | MEDLINE | ID: mdl-38457109

ABSTRACT

Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.

3.

INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.

Jiang, Jing; Li, Yachao; Zhang, Ruisheng; Liu, Yunwu.

J Mol Graph Model ; 128: 108703, 2024 05.

Article in English | MEDLINE | ID: mdl-38228013

ABSTRACT

Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.

Subject(s)

Drug Discovery , Electric Power Supplies , Databases, Factual

4.

TranGRU: focusing on both the local and global information of molecules for molecular property prediction.

Jiang, Jing; Zhang, Ruisheng; Ma, Jun; Liu, Yunwu; Yang, Enjie; Du, Shikang; Zhao, Zhili; Yuan, Yongna.

Appl Intell (Dordr) ; 53(12): 15246-15260, 2023.

Article in English | MEDLINE | ID: mdl-36405344

ABSTRACT

Molecular property prediction is an essential but challenging task in drug discovery. The recurrent neural network (RNN) and Transformer are the mainstream methods for sequence modeling, and both have been successfully applied independently for molecular property prediction. As the local information and global information of molecules are very important for molecular properties, we aim to integrate the bi-directional gated recurrent unit (BiGRU) into the original Transformer encoder, together with self-attention to better capture local and global molecular information simultaneously. To this end, we propose the TranGRU approach, which encodes the local and global information of molecules by using the BiGRU and self-attention, respectively. Then, we use a gate mechanism to reasonably fuse the two molecular representations. In this way, we enhance the ability of the proposed model to encode both local and global molecular information. Compared to the baselines and state-of-the-art methods when treating each task as a single-task classification on Tox21, the proposed approach outperforms the baselines on 9 out of 12 tasks and state-of-the-art methods on 5 out of 12 tasks. TranGRU also obtains the best ROC-AUC scores on BBBP, FDA, LogP, and Tox21 (multitask classification) and has a comparable performance on ToxCast, BACE, and ecoli. On the whole, TranGRU achieves better performance for molecular property prediction. The source code is available in GitHub: https://github.com/Jiangjing0122/TranGRU.

5.

MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction.

Liu, Yunwu; Zhang, Ruisheng; Li, Tongfeng; Jiang, Jing; Ma, Jun; Wang, Ping.

J Mol Graph Model ; 118: 108344, 2023 01.

Article in English | MEDLINE | ID: mdl-36242862

ABSTRACT

Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.

Subject(s)

Drug Discovery

6.

A deep learning method for predicting molecular properties and compound-protein interactions.

Ma, Jun; Zhang, Ruisheng; Li, Tongfeng; Jiang, Jing; Zhao, Zhili; Liu, Yunwu; Ma, Jun.

J Mol Graph Model ; 117: 108283, 2022 12.

Article in English | MEDLINE | ID: mdl-35994925

ABSTRACT

Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.

Subject(s)

Deep Learning , Animals , Humans , Amino Acid Sequence , Caenorhabditis elegans , Tumor Suppressor Protein p53

7.

MultiGran-SMILES: multi-granularity SMILES learning for molecular property prediction.

Jiang, Jing; Zhang, Ruisheng; Zhao, Zhili; Ma, Jun; Liu, Yunwu; Yuan, Yongna; Niu, Bojuan.

Bioinformatics ; 38(19): 4573-4580, 2022 Sep 30.

Article in English | MEDLINE | ID: mdl-35961025

ABSTRACT

MOTIVATION: Extracting useful molecular features is essential for molecular property prediction. Atom-level representation is a common representation of molecules, ignoring the sub-structure or branch information of molecules to some extent; however, it is vice versa for the substring-level representation. Both atom-level and substring-level representations may lose the neighborhood or spatial information of molecules. While molecular graph representation aggregating the neighborhood information of a molecule has a weak ability in expressing the chiral molecules or symmetrical structure. In this article, we aim to make use of the advantages of representations in different granularities simultaneously for molecular property prediction. To this end, we propose a fusion model named MultiGran-SMILES, which integrates the molecular features of atoms, sub-structures and graphs from the input. Compared with the single granularity representation of molecules, our method leverages the advantages of various granularity representations simultaneously and adjusts the contribution of each type of representation adaptively for molecular property prediction. RESULTS: The experimental results show that our MultiGran-SMILES method achieves state-of-the-art performance on BBBP, LogP, HIV and ClinTox datasets. For the BACE, FDA and Tox21 datasets, the results are comparable with the state-of-the-art models. Moreover, the experimental results show that the gains of our proposed method are bigger for the molecules with obvious functional groups or branches. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this work are available on GitHub at https://github. com/Jiangjing0122/MultiGran. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL