Search | VHL Regional Portal

Predicting Cardiotoxicity of Molecules Using Attention-Based Graph Neural Networks.

Vinh, Tuan; Nguyen, Loc; Trinh, Quang H; Nguyen-Vo, Thanh-Hoang; Nguyen, Binh P.

J Chem Inf Model ; 64(6): 1816-1827, 2024 Mar 25.

Article in English | MEDLINE | ID: mdl-38438914

ABSTRACT

In drug discovery, the search for new and effective medications is often hindered by concerns about toxicity. Numerous promising molecules fail to pass the later phases of drug development due to strict toxicity assessments. This challenge significantly increases the cost, time, and human effort needed to discover new therapeutic molecules. Additionally, a considerable number of drugs already on the market have been withdrawn or re-evaluated because of their unwanted side effects. Among the various types of toxicity, drug-induced heart damage is a severe adverse effect commonly associated with several medications, especially those used in cancer treatments. Although a number of computational approaches have been proposed to identify the cardiotoxicity of molecules, the performance and interpretability of the existing approaches are limited. In our study, we proposed a more effective computational framework to predict the cardiotoxicity of molecules using an attention-based graph neural network. Experimental results indicated that the proposed framework outperformed the other methods. The stability of the model was also confirmed by our experiments. To assist researchers in evaluating the cardiotoxicity of molecules, we have developed an easy-to-use online web server that incorporates our model.

Subject(s)

Cardiotoxicity , Drug Development , Humans , Drug Discovery , Heart , Neural Networks, Computer

i4mC-GRU: Identifying DNA N⁴-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Rahardja, Susanto; Nguyen, Binh P.

Comput Struct Biotechnol J ; 21: 3045-3053, 2023.

Article in English | MEDLINE | ID: mdl-37273848

ABSTRACT

N4-methylcytosine (4mC) is one of the most common DNA methylation modifications found in both prokaryotic and eukaryotic genomes. Since the 4mC has various essential biological roles, determining its location helps reveal unexplored physiological and pathological pathways. In this study, we propose an effective computational method called i4mC-GRU using a gated recurrent unit and duplet sequence-embedded features to predict potential 4mC sites in mouse (Mus musculus) genomes. To fairly assess the performance of the model, we compared our method with several state-of-the-art methods using two different benchmark datasets. Our results showed that i4mC-GRU achieved area under the receiver operating characteristic curve values of 0.97 and 0.89 and area under the precision-recall curve values of 0.98 and 0.90 on the first and second benchmark datasets, respectively. Briefly, our method outperformed existing methods in predicting 4mC sites in mouse genomes. Also, we deployed i4mC-GRU as an online web server, supporting users in genomics studies.

iNSP-GCAAP: Identifying nonclassical secreted proteins using global composition of amino acid properties.

Do, Trang T T; Nguyen-Vo, Thanh-Hoang; Pham, Hung T; Trinh, Quang H; Nguyen, Binh P.

Proteomics ; 23(1): e2100134, 2023 01.

Article in English | MEDLINE | ID: mdl-36401584

ABSTRACT

Nonclassical secreted proteins (NSPs) refer to a group of proteins released into the extracellular environment under the facilitation of different biological transporting pathways apart from the Sec/Tat system. As experimental determination of NSPs is often costly and requires skilled handling techniques, computational approaches are necessary. In this study, we introduce iNSP-GCAAP, a computational prediction framework, to identify NSPs. We propose using global composition of a customized set of amino acid properties to encode sequence data and use the random forest (RF) algorithm for classification. We used the training dataset introduced by Zhang et al. (Bioinformatics, 36(3), 704-712, 2020) to develop our model and test it with the independent test set in the same study. The area under the receiver operating characteristic curve on that test set was 0.9256, which outperformed other state-of-the-art methods using the same datasets. Our framework is also deployed as a user-friendly web-based application to support the research community to predict NSPs.

Subject(s)

Amino Acids , Proteins , Amino Acids/metabolism , Proteins/chemistry , Software , Computational Biology/methods , Algorithms

Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Do, Trang T T; Chua, Matthew Chin Heng; Nguyen, Binh P.

J Chem Inf Model ; 62(21): 5050-5058, 2022 Nov 14.

Article in English | MEDLINE | ID: mdl-36373285

ABSTRACT

Malaria is a threatening disease that has claimed many lives and has a high prevalence rate annually. Through the past decade, there have been many studies to uncover effective antimalarial compounds to combat this disease. Alongside chemically synthesized chemicals, a number of natural compounds have also been proven to be as effective in their antimalarial properties. Besides experimental approaches to investigate antimalarial activities in natural products, computational methods have been developed with satisfactory outcomes obtained. In this study, we propose a novel molecular encoding scheme based on Bidirectional Encoder Representations from Transformers and used our pretrained encoding model called NPBERT with four machine learning algorithms, including k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), eXtreme Gradient Boosting (XGB), and Random Forest (RF), to develop various prediction models to identify antimalarial natural products. The results show that SVM models are the best-performing classifiers, followed by the XGB, k-NN, and RF models. Additionally, comparative analysis between our proposed molecular encoding scheme and existing state-of-the-art methods indicates that NPBERT is more effective compared to the others. Moreover, the deployment of transformers in constructing molecular encoders is not limited to this study but can be utilized for other biomedical applications.

Subject(s)

Antimalarials , Biological Products , Antimalarials/pharmacology , Antimalarials/chemistry , Biological Products/pharmacology , Support Vector Machine , Machine Learning , Algorithms

iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Rahardja, Susanto; Nguyen, Binh P.

BMC Genomics ; 23(Suppl 5): 681, 2022 Oct 03.

Article in English | MEDLINE | ID: mdl-36192696

ABSTRACT

BACKGROUND: Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec - an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets. RESULTS: The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency. CONCLUSIONS: iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec .

Subject(s)

Memory, Short-Term , Software , Animals , Humans , Mice , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , TATA Box/genetics , Transcription Initiation Site , Transcription, Genetic

iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation.

Nguyen, Loc; Nguyen Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Bach Hoai; Nguyen-Hoang, Phuong-Uyen; Le, Ly; Nguyen, Binh P.

J Chem Inf Model ; 62(21): 5080-5089, 2022 Nov 14.

Article in English | MEDLINE | ID: mdl-35157472

ABSTRACT

Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.

Subject(s)

Antineoplastic Agents , Biological Products , Humans , Biological Products/pharmacology , Algorithms , Machine Learning , Databases, Factual , Antineoplastic Agents/pharmacology

iCYP-MFE: Identifying Human Cytochrome P450 Inhibitors Using Multitask Learning and Molecular Fingerprint-Embedded Encoding.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Nguyen, Thien-Ngan; Nguyen, Dung T; Nguyen, Binh P; Le, Ly.

J Chem Inf Model ; 62(21): 5059-5068, 2022 11 14.

Article in English | MEDLINE | ID: mdl-34672553

ABSTRACT

The human cytochrome P450 (CYP) superfamily holds responsibilities for the metabolism of both endogenous and exogenous compounds such as drugs, cellular metabolites, and toxins. The inhibition exerted on the CYP enzymes is closely associated with adverse drug reactions encompassing metabolic failures and induced side effects. In modern drug discovery, identification of potential CYP inhibitors is, therefore, highly essential. Alongside experimental approaches, numerous computational models have been proposed to address this biochemical issue. In this study, we introduce iCYP-MFE, a computational framework for virtual screening on CYP inhibitors toward 1A2, 2C9, 2C19, 2D6, and 3A4 isoforms. iCYP-MFE contains a set of five robust, stable, and effective prediction models developed using multitask learning incorporated with molecular fingerprint-embedded features. The results show that multitask learning can remarkably leverage useful information from related tasks to promote global performance. Comparative analysis indicates that iCYP-MFE achieves three predominant tasks, one equivalent task, and one less effective task compared to state-of-the-art methods. The area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR) were two decisive metrics used for model evaluation. The prediction task for CYP2D6-inhibition achieves the highest AUC-ROC value of 0.93 while the prediction task for CYP1A2-inhibition obtains the highest AUC-PR value of 0.92. The substructural analysis preliminarily explains the nature of the CYP-inhibitory activity of compounds. An online web server for iCYP-MFE with a user-friendly interface was also deployed to support scientific communities in identifying CYP inhibitors.

Subject(s)

Cytochrome P-450 Enzyme Inhibitors , Cytochrome P-450 Enzyme System , Humans , Cytochrome P-450 Enzyme Inhibitors/pharmacology , Cytochrome P-450 Enzyme Inhibitors/metabolism , Cytochrome P-450 Enzyme System/metabolism , Cytochrome P-450 CYP2D6 , Area Under Curve , Microsomes, Liver/metabolism

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL