Search | VHL Regional Portal

1.

Ligand and structure based hierarchical virtual screening cascade for finding novel epidermal growth factor receptor inhibitors.

Huo, Donghui; Sun, Zhiqi; Wang, Maolin; Yan, Aixia.

Chem Biol Drug Des ; 103(1): e14375, 2024 01.

Article in English | MEDLINE | ID: mdl-37849030

ABSTRACT

The epidermal growth factor receptor (EGFR) tyrosine kinase plays an important role in tumor formation and growth by mediating cell growth and other physiological processes. Therefore, EGFR is a promising target for the treatment of cancer. In this work, we combined ligand-based and structure-based virtual screening methods to identify novel EGFR inhibitors from a library of more than 103 thousand compounds. We first obtained hundreds of compounds with similar physiochemical properties through 3D molecular shape and electrostatic similarity screening with potent inhibitors AEE788 and Afatinib as queries. Next, we identified compounds with strong binding affinities to the EGFR pocket through molecular docking, which makes good use of the structure information of the receptor. After molecular scaffold analysis, our bioassay confirmed 13 compounds with EGFR inhibitory activity and three compounds had IC50 values below 1000 nM. In addition, we collected 5371 EGFR inhibitors from online databases, and clustered them into 7 groups by K-means method using their ECFP4 fingerprints as input. Each cluster had typical molecular fragments and corresponding activity characteristics, which could guide the design of EGFR inhibitors, and we concluded that the fragments from some of the hits are indicated in the highly active scaffolds.

Subject(s)

Antineoplastic Agents , Neoplasms , Humans , Molecular Docking Simulation , Protein Kinase Inhibitors/chemistry , Ligands , ErbB Receptors/metabolism , Afatinib/therapeutic use , Neoplasms/drug therapy , Antineoplastic Agents/pharmacology

2.

Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms.

Tian, Yujia; Zhang, Zhixing; Yan, Aixia.

Molecules ; 28(19)2023 Sep 23.

Article in English | MEDLINE | ID: mdl-37836625

ABSTRACT

Cyclooxygenase-2 (COX-2) and microsomal prostaglandin E2 synthase (mPGES-1) are two key targets in anti-inflammatory therapy. Medicine and food homology (MFH) substances have both edible and medicinal properties, providing a valuable resource for the development of novel, safe, and efficient COX-2 and mPGES-1 inhibitors. In this study, we collected active ingredients from 503 MFH substances and constructed the first comprehensive MFH database containing 27,319 molecules. Subsequently, we performed Murcko scaffold analysis and K-means clustering to deeply analyze the composition of the constructed database and evaluate its structural diversity. Furthermore, we employed four supervised machine learning algorithms, including support vector machine (SVM), random forest (RF), deep neural networks (DNNs), and eXtreme Gradient Boosting (XGBoost), as well as ensemble learning, to establish 640 classification models and 160 regression models for COX-2 and mPGES-1 inhibitors. Among them, ModelA_ensemble_RF_1 emerged as the optimal classification model for COX-2 inhibitors, achieving predicted Matthews correlation coefficient (MCC) values of 0.802 and 0.603 on the test set and external validation set, respectively. ModelC_RDKIT_SVM_2 was identified as the best regression model based on COX-2 inhibitors, with root mean squared error (RMSE) values of 0.419 and 0.513 on the test set and external validation set, respectively. ModelD_ECFP_SVM_4 stood out as the top classification model for mPGES-1 inhibitors, attaining MCC values of 0.832 and 0.584 on the test set and external validation set, respectively. The optimal regression model for mPGES-1 inhibitors, ModelF_3D_SVM_1, exhibited predictive RMSE values of 0.253 and 0.35 on the test set and external validation set, respectively. Finally, we proposed a ligand-based cascade virtual screening strategy, which integrated the well-performing supervised machine learning models with unsupervised learning: the self-organized map (SOM) and molecular scaffold analysis. Using this virtual screening workflow, we discovered 10 potential COX-2 inhibitors and 15 potential mPGES-1 inhibitors from the MFH database. We further verified candidates by molecular docking, investigated the interaction of the candidate molecules upon binding to COX-2 or mPGES-1. The constructed comprehensive MFH database has laid a solid foundation for the further research and utilization of the MFH substances. The series of well-performing machine learning models can be employed to predict the COX-2 and mPGES-1 inhibitory capabilities of unknown compounds, thereby aiding in the discovery of anti-inflammatory medications. The COX-2 and mPGES-1 potential inhibitor molecules identified through the cascade virtual screening approach provide insights and references for the design of highly effective and safe novel anti-inflammatory drugs.

Subject(s)

Anti-Inflammatory Agents , Cyclooxygenase 2 Inhibitors , Cyclooxygenase 2 Inhibitors/pharmacology , Cyclooxygenase 2 , Molecular Docking Simulation , Algorithms , Machine Learning , Metabolic Networks and Pathways

3.

Machine learning-based classification models for non-covalent Bruton's tyrosine kinase inhibitors: predictive ability and interpretability.

Li, Guo; Li, Jiaxuan; Tian, Yujia; Zhao, Yunyang; Pang, Xiaoyang; Yan, Aixia.

Mol Divers ; 2023 Jul 21.

Article in English | MEDLINE | ID: mdl-37479824

ABSTRACT

In this study, we built classification models using machine learning techniques to predict the bioactivity of non-covalent inhibitors of Bruton's tyrosine kinase (BTK) and to provide interpretable and transparent explanations for these predictions. To achieve this, we gathered data on BTK inhibitors from the Reaxys and ChEMBL databases, removing compounds with covalent bonds and duplicates to obtain a dataset of 3895 inhibitors of non-covalent. These inhibitors were characterized using MACCS fingerprints and Morgan fingerprints, and four traditional machine learning algorithms (decision trees (DT), random forests (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost)) were used to build 16 classification models. In addition, four deep learning models were developed using deep neural networks (DNN). The best model, Model D_4, which was built using XGBoost and MACCS fingerprints, achieved an accuracy of 94.1% and a Matthews correlation coefficient (MCC) of 0.75 on the test set. To provide interpretable explanations, we employed the SHAP method to decompose the predicted values into the contributions of each feature. We also used K-means dimensionality reduction and hierarchical clustering to visualize the clustering effects of molecular structures of the inhibitors. The results of this study were validated using crystal structures, and we found that the interaction between the BTK amino acid residue and the important features of clustered scaffold was consistent with the known properties of the complex crystal structures. Overall, our models demonstrated high predictive ability and a qualitative model can be converted to a quantitative model to some extent by SHAP, making them valuable for guiding the design of new BTK inhibitors with desired activity.

4.

Classification of FLT3 inhibitors and SAR analysis by machine learning methods.

Zhao, Yunyang; Tian, Yujia; Pang, Xiaoyang; Li, Guo; Shi, Shenghui; Yan, Aixia.

Mol Divers ; 2023 May 05.

Article in English | MEDLINE | ID: mdl-37142889

ABSTRACT

FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.

5.

Prediction and Structure-Activity Relationship Analysis on Ready Biodegradability of Chemical Using Machine Learning Method.

Yin, Hongyan; Lin, Cheng; Tian, Yujia; Yan, Aixia.

Chem Res Toxicol ; 36(4): 617-629, 2023 04 17.

Article in English | MEDLINE | ID: mdl-37017429

ABSTRACT

Persistent contaminants from different industries have already caused significant risks to the environment and public health. In this study, a data set containing 1306 not readily biodegradable (NRB) and 622 readily biodegradable (RB) chemicals was collected and characterized by CORINA descriptors, MACCS fingerprints, and ECFP_4 fingerprints. We utilized decision tree (DT), support vector machine (SVM), random forest (RF), and deep neural network (DNN) to construct 34 classification models that could predict the biodegradability of compounds. The best model (model 5F) built using a Transformer-CNN algorithm had a balanced accuracy of 86.29% and a Matthews correlation coefficient of 0.71 on the test set. By analyzing the top 10 CORINA descriptors used for modeling, the properties containing solubility, π/σ atom charges, rotatable bonds number, lone pair/π/σ atom electronegativities, molecular weight, and number of nitrogen atom based hydrogen bonding acceptors were determined to be critical for biodegradability. The substructure investigations confirmed earlier studies that the presence of aromatic rings and nitrogen or halogen substitutions in a molecule will hinder the biodegradation of the compound, while the ester groups and carboxyl groups promote biodegradability. We also identified the representative fragments affecting biodegradability by analyzing the frequency differences of substructural fragments between the NRB and RB compounds. The results of the study can provide excellent guidance for the discovery and design of compounds with good chemical biodegradability.

Subject(s)

Algorithms , Machine Learning , Structure-Activity Relationship , Neural Networks, Computer , Support Vector Machine

6.

Prediction of bioactivities of microsomal prostaglandin E₂ synthase-1 inhibitors by machine learning algorithms.

Tian, Yujia; Yang, Zhenwu; Wang, Hongzhao; Yan, Aixia.

Chem Biol Drug Des ; 101(6): 1307-1321, 2023 06.

Article in English | MEDLINE | ID: mdl-36752697

ABSTRACT

There is a strong interest in the development of microsomal prostaglandin E2 synthase-1 (mPGES-1) inhibitors of their potential to safely and effectively treat inflammation. Herein, 70 QSAR models were built on the dataset (735 mPGES-1 inhibitors) characterized with RDKit descriptors by multiple linear regression (MLR), support vector machine (SVM), random forest (RF), deep neural networks (DNN), and eXtreme Gradient Boosting (XGBoost). The other three regression models on the dataset are represented by SMILES using self-attention recurrent neural networks (RNN) and Graph Convolutional Networks (GCN). For the best model (Model C2), which was developed by SVM with RDKit descriptors, the coefficient of determination (R2 ) of 0.861 and root mean squared error (RMSE) of 0.235 were achieved for the test set. Additionally, R2 of 0.692 and RMSE of 0.383 were obtained on the external test set. We investigated the applicability domain (AD) of Model C2 with the rivality index (RI), the prediction of Model C2 on 78.92% of molecules in the test set, and 78.33% of molecules in the external test set were reliable. After dissecting the RDKit descriptors of Model C2, we found important physicochemical properties of highly active mPGES-1 inhibitors. Besides, by analyzing the attention weight of each atom of each inhibitor from the attention layer, we found that the benzamide group and the trifluoromethyl cyclohexane group are favorable substructures for mPGES-1 inhibitors.

Subject(s)

Algorithms , Quantitative Structure-Activity Relationship , Prostaglandin-E Synthases , Machine Learning , Support Vector Machine , Prostaglandins

7.

Classification models and SAR analysis on HDAC1 inhibitors using machine learning methods.

Li, Rourou; Tian, Yujia; Yang, Zhenwu; Ji, Yueshan; Ding, Jiaqi; Yan, Aixia.

Mol Divers ; 27(3): 1037-1051, 2023 Jun.

Article in English | MEDLINE | ID: mdl-35737257

ABSTRACT

Histone deacetylase (HDAC) 1, a member of the histone deacetylases family, plays a pivotal role in various tumors. In this study, we collected 7313 human HDAC1 inhibitors with bioactivities to form a dataset. Then, the dataset was divided into a training set and a test set using two splitting methods: (1) Kohonen's self-organizing map and (2) random splitting. The molecular structures were represented by MACCS fingerprints, RDKit fingerprints, topological torsions fingerprints and ECFP4 fingerprints. A total of 80 classification models were built by using five machine learning methods, including decision tree (DT), random forest, support vector machine, eXtreme Gradient Boosting and deep neural network. Model 15A_2 built by the XGBoost algorithm based on ECFP4 fingerprints showed the best performance, with an accuracy of 88.08% and an MCC value of 0.76 on the test set. Finally, we clustered the 7313 HDAC1 inhibitors into 31 subsets, and the substructural features in each subset were investigated. Moreover, using DT algorithm we analyzed the structure-activity relationship of HDAC1 inhibitors. It may conclude that some substructures have a significant effect on high activity, such as N-(2-amino-phenyl)-benzamide, benzimidazole, AR-42 analogues, hydroxamic acid with a middle chain alkyl and 4-aryl imidazole with a midchain of alkyl whose α carbon is chiral.

Subject(s)

Algorithms , Machine Learning , Humans , Structure-Activity Relationship , Molecular Structure , Support Vector Machine , Histone Deacetylase 1

8.

A Large Acute Gastroenteritis Outbreak Associated with Both Campylobacter coli and Human Sapovirus - Beijing Municipality, China, 2021.

Zou, Lin; Li, Ying; Zhou, Guilan; Huang, Zhenzhou; Ju, Changyan; Zhao, Chunyan; Gao, Xiang; Zhen, Bojun; Zhang, Ping; Guo, Xiaochen; Zhang, Jing; Zhang, Yang; Liu, Bo; Zhou, Shaolei; Yan, Aixia; Kang, Ying; Wang, Yanchun; Ma, Hongmei; Li, Xiaohui; Zhang, Maojun.

China CDC Wkly ; 5(52): 1167-1173, 2023 Dec 29.

Article in English | MEDLINE | ID: mdl-38164467

ABSTRACT

What is already known about this topic?: Campylobacter is a significant foodborne pathogen that leads to global outbreaks of acute gastroenteritis (AGE) usually affecting less than 30 individuals. Human sapovirus (HuSaV) is an enteric virus responsible for sporadic cases and outbreaks of AGE worldwide. In a study conducted in Beijing, HuSaV detection ranked second after norovirus. What is added by this report?: We present a discussion of the first large-scale outbreak of AGE caused by both Campylobacter coli (C. coli) and HuSaV. The outbreak involved a total of 996 patients and exhibited two distinct peaks over a period of 17 days. Through case-control studies, we identified exposure to raw water from a secondary water supply system as a significant risk factor. Among 83 patients, 49 samples tested positive for C. coli, 39 samples tested positive for HuSaV, and 27 samples tested positive for both pathogens using real-time polymerase chain reaction detection. Furthermore, whole-genome sequencing of 17 C. coli isolates obtained from 17 patients revealed that all isolates belonged to a highly clonal strain of C. coli. What are the implications for public health practice?: Outbreaks of AGE resulting from multiple pathogen infections warrant increased attention. This report emphasizes the significance of ensuring the safety of drinking water, particularly in secondary supply systems.

9.

Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation.

Kong, Yue; Zhao, Xiaoman; Liu, Ruizi; Yang, Zhenwu; Yin, Hongyan; Zhao, Bowen; Wang, Jinling; Qin, Bingjie; Yan, Aixia.

J Cheminform ; 14(1): 52, 2022 Aug 04.

Article in English | MEDLINE | ID: mdl-35927691

ABSTRACT

Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.

10.

Building 2D classification models and 3D CoMSIA models on small-molecule inhibitors of both wild-type and T790M/L858R double-mutant EGFR.

Huo, Donghui; Wang, Hongzhao; Qin, Zijian; Tian, Yujia; Yan, Aixia.

Mol Divers ; 26(3): 1715-1730, 2022 Jun.

Article in English | MEDLINE | ID: mdl-34636023

ABSTRACT

Epidermal growth factor receptor (EGFR) has received widespread attention because it is an important target for anticancer drug design. Mutations in the EGFR, especially the T790M/L858R double mutation, have made cancer treatment more difficult. We herein built the structure-activity relationship models of small-molecule inhibitors on wild-type and T790M/L858R double-mutant EGFR with a whole dataset of 379 compounds. For 2D classification models, we used ECFP4 fingerprints to build support vector machine and random forest models and used SMILES to build self-attention recurrent neural network models. Each of all six models resulted in an accuracy of above 0.87 and the Matthews correlation coefficient value of above 0.76 on the test set, respectively. We concluded that inhibitors containing anilinoquinoline and methoxy or fluoro phenyl are highly active against wild EGFR. Substructures such as anilinopyrimidine, acrylamide, amino phenyl, methoxy phenyl, and thienopyrimidinyl amide appeared more in highly active inhibitors against double-mutant EGFR. We also used self-organizing map to cluster the inhibitors into six subsets based on ECFP4 fingerprints and analyzed the activity characteristics of different scaffolds in each subset. Among them, three datasets, which are based on pteridin, anilinopyrimidine, and anilinoquinoline scaffold, were selected to build 3D comparative molecular similarity analysis models individually. Models with the leave-one-out coefficient of determination (q2) above 0.65 were selected, and five descriptor types (steric, electrostatic, hydrophobic, donor, and acceptor) were used to study the effects of side chains of inhibitors on the activity against wild-type and mutant-type EGFR.

Subject(s)

ErbB Receptors , Lung Neoplasms , Cell Line, Tumor , Drug Design , ErbB Receptors/genetics , Humans , Lung Neoplasms/drug therapy , Mutation , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Structure-Activity Relationship

11.

Discovery of Novel Epidermal Growth Factor Receptor (EGFR) Inhibitors Using Computational Approaches.

Huo, Donghui; Wang, Shiyu; Kong, Yue; Qin, Zijian; Yan, Aixia.

J Chem Inf Model ; 62(21): 5149-5164, 2022 Nov 14.

Article in English | MEDLINE | ID: mdl-34931847

ABSTRACT

The epidermal growth factor receptor (EGFR) signaling pathway plays an important role in cell growth, proliferation, differentiation, and other physiological processes, which makes the EGFR a promising target for anticancer therapies. The discovery of novel EGFR inhibitors may provide a solution to the problem of drug resistance. In this work, we performed a ligand-based virtual screening (LBVS) protocol for finding novel EGFR inhibitors from a 5.3 million compound library. First, the 3D shape-based similarity was used to obtain structurally novel EGFR inhibitors. In this study, we tried three queries; two were crystal structures and one was generated from deep generative models of graphs (DGMG). Next, we have built four structure-activity relationship (SAR) models and three quantitative structure-activity relationship (QSAR) models based on an SVM method for further screening of highly active EGFR inhibitors. Experimental validations led to the identification of nine hits out of 18 tested compounds. Among them, hit 1, hit 5, and hit 6 had IC50 values around 80 nM against EGFR whose interactions with EGFR were further investigated by molecular dynamics simulations.

Subject(s)

Protein Kinase Inhibitors , Quantitative Structure-Activity Relationship , Protein Kinase Inhibitors/chemistry , ErbB Receptors/chemistry , Ligands , Cell Proliferation , Molecular Docking Simulation

12.

Risk Factors and Outcomes of Pulmonary Hypertension in Infants With Bronchopulmonary Dysplasia: A Meta-Analysis.

Chen, Ying; Zhang, Di; Li, Ying; Yan, Aixia; Wang, Xiaoying; Hu, Xiaoming; Shi, Hangting; Du, Yue; Zhang, Wenhui.

Front Pediatr ; 9: 695610, 2021.

Article in English | MEDLINE | ID: mdl-34249820

ABSTRACT

Background: Pulmonary hypertension is one of the most common co-morbidities in infants with bronchopulmonary dysplasia (BPD), but its risk factors are unclear. The onset of pulmonary hypertension in BPD has been associated with poor morbidity- and mortality-related outcomes in infants. Two review and meta-analysis studies have evaluated the risk factors and outcomes associated with pulmonary hypertension in infants with BPD. However, the limitations in those studies and the publication of recent cohort studies warrant our up-to-date study. We designed a systematic review and meta-analysis to evaluate the risk factors and outcomes of pulmonary hypertension in infants with BPD. Objective: To systematically evaluate the risk factors and outcomes associated with pulmonary hypertension in infants with BPD. Methods: We systematically searched the academic literature according to the PRISMA guidelines across five databases (Web of Science, EMBASE, CENTRAL, Scopus, and MEDLINE). We conducted random-effects meta-analyses to evaluate the pulmonary hypertension risk factors in infants with BPD. We also evaluated the overall morbidity- and mortality-related outcomes in infants with BPD and pulmonary hypertension. Results: We found 15 eligible studies (from the initial 963 of the search result) representing data from 2,156 infants with BPD (mean age, 25.8 ± 0.71 weeks). The overall methodological quality of the included studies was high. Our meta-analysis in infants with severe BPD revealed increased risks of pulmonary hypertension [Odds ratio (OR) 11.2], sepsis (OR, 2.05), pre-eclampsia (OR, 1.62), and oligohydramnios (OR, 1.38) of being small for gestational age (3.31). Moreover, a comparative analysis found medium-to-large effects of pulmonary hypertension on the total duration of hospital stay (Hedge's g, 0.50), the total duration of oxygen received (g, 0.93), the cognitive score (g, -1.5), and the overall mortality (g, 0.83) in infants with BPD. Conclusion: We identified several possible risk factors (i.e., severe BPD, sepsis, small for gestational age, pre-eclampsia) which promoted the onset of pulmonary hypertension in infants with BPD. Moreover, our review sheds light on the morbidity- and mortality-related outcomes associated with pulmonary hypertension in these infants. Our present findings are in line with the existing literature. The findings from this research will be useful in development of efficient risk-based screening system that determine the outcomes associated with pulmonary hypertension in infants with BPD.

13.

A comprehensive comparative assessment of 3D molecular similarity tools in ligand-based virtual screening.

Jiang, Zhenla; Xu, Jianrong; Yan, Aixia; Wang, Ling.

Brief Bioinform ; 22(6)2021 11 05.

Article in English | MEDLINE | ID: mdl-34151363

ABSTRACT

Three-dimensional (3D) molecular similarity, one major ligand-based virtual screening (VS) method, has been widely used in the drug discovery process. A variety of 3D molecular similarity tools have been developed in recent decades. In this study, we assessed a panel of 15 3D molecular similarity programs against the DUD-E and LIT-PCBA datasets, including commercial ROCS and Phase, in terms of screening power and scaffold-hopping power. The results revealed that (1) SHAFTS, LS-align, Phase Shape_Pharm and LIGSIFT showed the best VS capability in terms of screening power. Some 3D similarity tools available to academia can yield relatively better VS performance than commercial ROCS and Phase software. (2) Current 3D similarity VS tools exhibit a considerable ability to capture actives with new chemotypes in terms of scaffold hopping. (3) Multiple conformers relative to single conformations will generally improve VS performance for most 3D similarity tools, with marginal improvement observed in area under the receiving operator characteristic curve values, enrichment factor in the top 1% and hit rate in the top 1% values showed larger improvement. Moreover, redundancy and complementarity analyses of hit lists from different query seeds and different 3D similarity VS tools showed that the combination of different query seeds and/or different 3D similarity tools in VS campaigns retrieved more (and more diverse) active molecules. These findings provide useful information for guiding choices of the optimal 3D molecular similarity tools for VS practices and designing possible combination strategies to discover more diverse active compounds.

Subject(s)

Drug Discovery/methods , Models, Molecular , Molecular Conformation , Software , Area Under Curve , Benchmarking , Databases, Pharmaceutical , Drug Design , Drug Evaluation, Preclinical/methods , Ligands , Molecular Structure , ROC Curve , Web Browser

14.

Classification models and SAR analysis on CysLT1 receptor antagonists using machine learning algorithms.

Wang, Hongzhao; Qin, Zijian; Yan, Aixia.

Mol Divers ; 25(3): 1597-1616, 2021 Aug.

Article in English | MEDLINE | ID: mdl-33534023

ABSTRACT

Cysteinyl leukotrienes 1 (CysLT1) receptor is a promising drug target for rhinitis or other allergic diseases. In our study, we built classification models to predict bioactivities of CysLT1 receptor antagonists. We built a dataset with 503 CysLT1 receptor antagonists which were divided into two groups: highly active molecules (IC50 < 1000 nM) and weakly active molecules (IC50 ≥ 1000 nM). The molecules were characterized by several descriptors including CORINA descriptors, MACCS fingerprints, Morgan fingerprint and molecular SMILES. For CORINA descriptors and two types of fingerprints, we used the random forests (RF) and deep neural networks (DNN) to build models. For molecular SMILES, we used recurrent neural networks (RNN) with the self-attention to build models. The accuracies of test sets for all models reached 85%, and the accuracy of the best model (Model 2C) was 93%. In addition, we made structure-activity relationship (SAR) analyses on CysLT1 receptor antagonists, which were based on the output from the random forest models and RNN model. It was found that highly active antagonists usually contained the common substructures such as tetrazoles, indoles and quinolines. These substructures may improve the bioactivity of the CysLT1 receptor antagonists.

Subject(s)

Algorithms , Leukotriene Antagonists/chemistry , Machine Learning , Models, Molecular , Receptors, Leukotriene/chemistry , Binding Sites , Cheminformatics/methods , Drug Discovery , Leukotriene Antagonists/pharmacology , Molecular Structure , Protein Binding , Quantitative Structure-Activity Relationship , ROC Curve , Reproducibility of Results

15.

Fingerprint-based computational models of 5-lipo-oxygenase activating protein inhibitors: Activity prediction and structure clustering.

Tu, Guiping; Qin, Zijian; Huo, Donghui; Zhang, Shengde; Yan, Aixia.

Chem Biol Drug Des ; 96(3): 931-947, 2020 09.

Article in English | MEDLINE | ID: mdl-33058463

ABSTRACT

Inflammatory diseases can be treated by inhibiting 5-lipo-oxygenase activating protein (FLAP). In this study, a data set containing 2,112 FLAP inhibitors was collected. A total of 25 classification models were built by five machine learning algorithms with five different types of fingerprints. The best model, which was built by support vector machine algorithm with ECFP_4 fingerprint had an accuracy and a Matthews correlation coefficient of 0.862 and 0.722 on the test set, respectively. The predicted results were further evaluated by the application domain dSTD-PRO (a distance between one compound to models). Each compound had a dSTD-PRO value, which was calculated by the predicted probabilities obtained from all 25 models. The application domain results suggested that the reliability of predicted results depended mainly on the compounds themselves rather than algorithms or fingerprints. A group of customized 10-bit fingerprint was manually defined for clustering the molecular structures of 2,112 FLAP inhibitors into eight subsets by K-Means. According to the clustering results, most of inhibitors in two subsets (subsets 2 and 4) were highly active inhibitors. We found that aryl oxadiazole/oxazole alkanes, biaryl amino-heteroarenes, two aromatic rings (often N-containing) linked by a cyclobutene group, and 1,2,4-triazole group were typical fragments in highly active inhibitors.

Subject(s)

5-Lipoxygenase-Activating Proteins/drug effects , Computer Simulation , Algorithms , Cluster Analysis , Datasets as Topic , Machine Learning , Molecular Structure , Support Vector Machine

16.

Quantitative Structure-Activity Relationship Study for HIV-1 LEDGF/p75 Inhibitors.

Li, Yang; Tian, Yujia; Xi, Yao; Qin, Zijian; Yan, Aixia.

Curr Comput Aided Drug Des ; 16(5): 654-666, 2020.

Article in English | MEDLINE | ID: mdl-31538902

ABSTRACT

BACKGROUND: HIV-1 Integrase (IN) is an important target for the development of the new anti-AIDS drugs. HIV-1 LEDGF/p75 inhibitors, which block the integrase and LEDGF/p75 interaction, have been validated for reduction in HIV-1 viral replicative capacity. METHODS: In this work, computational Quantitative Structure-Activity Relationship (QSAR) models were developed for predicting the bioactivity of HIV-1 integrase LEDGF/p75 inhibitors. We collected 190 inhibitors and their bioactivities in this study and divided the inhibitors into nine scaffolds by the method of T-distributed Stochastic Neighbor Embedding (TSNE). These 190 inhibitors were split into a training set and a test set according to the result of a Kohonen's self-organizing map (SOM) or randomly. Multiple Linear Regression (MLR) models, support vector machine (SVM) models and two consensus models were built based on the training sets by 20 selected CORINA Symphony descriptors. RESULTS: All the models showed a good prediction of pIC50. The correlation coefficients of all the models were more than 0.7 on the test set. For the training set of consensus Model C1, which performed better than other models, the correlation coefficient(r) achieved 0.909 on the training set, and 0.804 on the test set. CONCLUSION: The selected molecular descriptors show that hydrogen bond acceptor, atom charges and electronegativities (especially π atom) were important in predicting the activity of HIV-1 integrase LEDGF/p75-IN inhibitors.

Subject(s)

Anti-HIV Agents/chemistry , Drug Discovery/methods , HIV Integrase Inhibitors/chemistry , HIV-1/drug effects , Drug Design , Humans , Models, Molecular , Molecular Structure , Quantitative Structure-Activity Relationship , Structure-Activity Relationship

17.

Classification of Cyclooxygenase-2 Inhibitors Using Support Vector Machine and Random Forest Methods.

Qin, Zijian; Xi, Yao; Zhang, Shengde; Tu, Guiping; Yan, Aixia.

J Chem Inf Model ; 59(5): 1988-2008, 2019 05 28.

Article in English | MEDLINE | ID: mdl-30762371

ABSTRACT

This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.

Subject(s)

Cyclooxygenase 2 Inhibitors/classification , Support Vector Machine , Databases, Pharmaceutical

18.

SAR study on inhibitors of GIIA secreted phospholipase A₂ using machine learning methods.

Zhang, Shengde; Li, Yang; Qin, Zijian; Tu, Guipin; Chen, Guang; Yan, Aixia.

Chem Biol Drug Des ; 93(5): 666-684, 2019 05.

Article in English | MEDLINE | ID: mdl-30582300

ABSTRACT

GIIA secreted phospholipase A2 (GIIA sPLA2 ) is a potent target for drug discovery. To distinguish the activity level of the inhibitors of GIIA sPLA2 , we built 24 classification models by three machine learning algorithms including support vector machine (SVM), decision tree (DT), and random forest (RF) based on 452 compounds. The molecules were represented by CORINA descriptors, MACCS fingerprints, and ECFP4 fingerprints, respectively. The dataset was split into a training set containing 312 compounds and a test set containing 140 compounds by Kohonen's self-organizing map (SOM) strategy and a random strategy. A recursive feature elimination (RFE) method and an information gain (IG) method were used in the selection of molecular descriptors. Three favorable performing models were obtained. They were built by SVM algorithm with CORINA descriptors (Models 1A and 2A) and ECFP4 fingerprints (Model 10A). In the prediction of test set of Model 10A, the accuracy reached 90.71%, and the Matthews correlation coefficient (MCC) values reached 0.82. In addition, the 452 inhibitors were clustered into eight subsets by K-Means algorithm for analyzing their structural features. It was found that highly active inhibitors mainly contained indole scaffold or indolizine scaffold and four side chains.

Subject(s)

Enzyme Inhibitors/chemistry , Group II Phospholipases A2/antagonists & inhibitors , Machine Learning , Cluster Analysis , Enzyme Inhibitors/metabolism , Group II Phospholipases A2/metabolism , Humans , Principal Component Analysis , Structure-Activity Relationship

19.

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods.

Li, Yang; Tian, Yujia; Qin, Zijian; Yan, Aixia.

ACS Omega ; 3(11): 15837-15849, 2018 Nov 30.

Article in English | MEDLINE | ID: mdl-30556015

ABSTRACT

HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the "consensus prediction". Model C3a developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs.

20.

Identification of Novel Aurora Kinase A (AURKA) Inhibitors via Hierarchical Ligand-Based Virtual Screening.

Kong, Yue; Bender, Andreas; Yan, Aixia.

J Chem Inf Model ; 58(1): 36-47, 2018 01 22.

Article in English | MEDLINE | ID: mdl-29202231

ABSTRACT

Aurora kinases are essential for cell mitosis, amplified, and overexpressed in various human malignancies. Therefore, Aurora kinases have been promising targets for anticancer therapies, which has prompted an intensive search for their small-molecule inhibitors. In this work, we performed a hierarchical and time-efficient virtual screening cascade for scaffold hopping, aiming to obtain structurally novel and highly potent hit compounds targeting Aurora kinases. The cascade consisted of a shape- and an electrostatic-based protocol, combined with a QSAR-based selection protocol. This virtual screening cascade was used to screen two databases, one commercial database named the J&K database containing about 5.2 million diverse molecules and the Drugbank database. Experimental validations led to the identification of one structurally novel and highly potent hit compound (hit 1, found to possess an IC50 of 8.1 and 19 nM for Aurora kinases A and B, respectively), which can be a promising starting point for further exploration. Additionally, Aurora kinases were identified as off-targets for hits 2-6 (Crizotinib, CI-1033, Dasatinib, Bosutinib, MLN-518), which are approved or investigational drugs as listed in Drugbank, plausibly suggesting targeting Aurora kinases may even contribute to their mechanism of action.

Subject(s)

Aurora Kinase A/antagonists & inhibitors , High-Throughput Screening Assays/methods , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Databases, Chemical , Humans , Inhibitory Concentration 50 , Ligands , Models, Chemical , Molecular Docking Simulation , Molecular Structure , Quantitative Structure-Activity Relationship , Static Electricity , Support Vector Machine

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL