Search | VHL Regional Portal

i4mC-GRU: Identifying DNA N⁴-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Rahardja, Susanto; Nguyen, Binh P.

Comput Struct Biotechnol J ; 21: 3045-3053, 2023.

Article in English | MEDLINE | ID: mdl-37273848

ABSTRACT

N4-methylcytosine (4mC) is one of the most common DNA methylation modifications found in both prokaryotic and eukaryotic genomes. Since the 4mC has various essential biological roles, determining its location helps reveal unexplored physiological and pathological pathways. In this study, we propose an effective computational method called i4mC-GRU using a gated recurrent unit and duplet sequence-embedded features to predict potential 4mC sites in mouse (Mus musculus) genomes. To fairly assess the performance of the model, we compared our method with several state-of-the-art methods using two different benchmark datasets. Our results showed that i4mC-GRU achieved area under the receiver operating characteristic curve values of 0.97 and 0.89 and area under the precision-recall curve values of 0.98 and 0.90 on the first and second benchmark datasets, respectively. Briefly, our method outperformed existing methods in predicting 4mC sites in mouse genomes. Also, we deployed i4mC-GRU as an online web server, supporting users in genomics studies.

iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Rahardja, Susanto; Nguyen, Binh P.

BMC Genomics ; 23(Suppl 5): 681, 2022 Oct 03.

Article in English | MEDLINE | ID: mdl-36192696

ABSTRACT

BACKGROUND: Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec - an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets. RESULTS: The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency. CONCLUSIONS: iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec .

Subject(s)

Memory, Short-Term , Software , Animals , Humans , Mice , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , TATA Box/genetics , Transcription Initiation Site , Transcription, Genetic

iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation.

Nguyen, Loc; Nguyen Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Bach Hoai; Nguyen-Hoang, Phuong-Uyen; Le, Ly; Nguyen, Binh P.

J Chem Inf Model ; 62(21): 5080-5089, 2022 Nov 14.

Article in English | MEDLINE | ID: mdl-35157472

ABSTRACT

Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.

Subject(s)

Antineoplastic Agents , Biological Products , Humans , Biological Products/pharmacology , Algorithms , Machine Learning , Databases, Factual , Antineoplastic Agents/pharmacology

iCYP-MFE: Identifying Human Cytochrome P450 Inhibitors Using Multitask Learning and Molecular Fingerprint-Embedded Encoding.

Nguyen-Vo, Thanh-Hoang; Trinh, Quang H; Nguyen, Loc; Nguyen-Hoang, Phuong-Uyen; Nguyen, Thien-Ngan; Nguyen, Dung T; Nguyen, Binh P; Le, Ly.

J Chem Inf Model ; 62(21): 5059-5068, 2022 11 14.

Article in English | MEDLINE | ID: mdl-34672553

ABSTRACT

The human cytochrome P450 (CYP) superfamily holds responsibilities for the metabolism of both endogenous and exogenous compounds such as drugs, cellular metabolites, and toxins. The inhibition exerted on the CYP enzymes is closely associated with adverse drug reactions encompassing metabolic failures and induced side effects. In modern drug discovery, identification of potential CYP inhibitors is, therefore, highly essential. Alongside experimental approaches, numerous computational models have been proposed to address this biochemical issue. In this study, we introduce iCYP-MFE, a computational framework for virtual screening on CYP inhibitors toward 1A2, 2C9, 2C19, 2D6, and 3A4 isoforms. iCYP-MFE contains a set of five robust, stable, and effective prediction models developed using multitask learning incorporated with molecular fingerprint-embedded features. The results show that multitask learning can remarkably leverage useful information from related tasks to promote global performance. Comparative analysis indicates that iCYP-MFE achieves three predominant tasks, one equivalent task, and one less effective task compared to state-of-the-art methods. The area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR) were two decisive metrics used for model evaluation. The prediction task for CYP2D6-inhibition achieves the highest AUC-ROC value of 0.93 while the prediction task for CYP1A2-inhibition obtains the highest AUC-PR value of 0.92. The substructural analysis preliminarily explains the nature of the CYP-inhibitory activity of compounds. An online web server for iCYP-MFE with a user-friendly interface was also deployed to support scientific communities in identifying CYP inhibitors.

Subject(s)

Cytochrome P-450 Enzyme Inhibitors , Cytochrome P-450 Enzyme System , Humans , Cytochrome P-450 Enzyme Inhibitors/pharmacology , Cytochrome P-450 Enzyme Inhibitors/metabolism , Cytochrome P-450 Enzyme System/metabolism , Cytochrome P-450 CYP2D6 , Area Under Curve , Microsomes, Liver/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL