Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Commun Biol ; 6(1): 1135, 2023 11 09.
Article in English | MEDLINE | ID: mdl-37945666

ABSTRACT

Recently developed enzymes for the depolymerization of polyethylene terephthalate (PET) such as FAST-PETase and LCC-ICCG are inhibited by the intermediate PET product mono(2-hydroxyethyl) terephthalate (MHET). Consequently, the conversion of PET enzymatically into its constituent monomers terephthalic acid (TPA) and ethylene glycol (EG) is inefficient. In this study, a protein scaffold (1TQH) corresponding to a thermophilic carboxylesterase (Est30) was selected from the structural database and redesigned in silico. Among designs, a double variant KL-MHETase (I171K/G130L) with a similar protein melting temperature (67.58 °C) to that of the PET hydrolase FAST-PETase (67.80 °C) exhibited a 67-fold higher activity for MHET hydrolysis than FAST-PETase. A fused dual enzyme system comprising KL-MHETase and FAST-PETase exhibited a 2.6-fold faster PET depolymerization rate than FAST-PETase alone. Synergy increased the yield of TPA by 1.64 fold, and its purity in the released aromatic products reached 99.5%. In large reaction systems with 100 g/L substrate concentrations, the dual enzyme system KL36F achieved over 90% PET depolymerization into monomers, demonstrating its potential applicability in the industrial recycling of PET plastics. Therefore, a dual enzyme system can greatly reduce the reaction and separation cost for sustainable enzymatic PET recycling.


Subject(s)
Hydrolases , Polyethylene Terephthalates , Hydrolases/chemistry , Polyethylene Terephthalates/chemistry , Polyethylene Terephthalates/metabolism , Hydrolysis , Carboxylesterase , Plastics/chemistry
2.
Comput Struct Biotechnol J ; 21: 5544-5560, 2023.
Article in English | MEDLINE | ID: mdl-38034401

ABSTRACT

Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (Tm). However, due to the limited availability of experimentally determined Tm data and the insufficient accuracy of existing computational methods in predicting Tm, there is an urgent need for a computational approach to accurately forecast the Tm values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the Tm values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R2) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the Tm values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering.

3.
Chem Biol Drug Des ; 101(6): 1307-1321, 2023 06.
Article in English | MEDLINE | ID: mdl-36752697

ABSTRACT

There is a strong interest in the development of microsomal prostaglandin E2 synthase-1 (mPGES-1) inhibitors of their potential to safely and effectively treat inflammation. Herein, 70 QSAR models were built on the dataset (735 mPGES-1 inhibitors) characterized with RDKit descriptors by multiple linear regression (MLR), support vector machine (SVM), random forest (RF), deep neural networks (DNN), and eXtreme Gradient Boosting (XGBoost). The other three regression models on the dataset are represented by SMILES using self-attention recurrent neural networks (RNN) and Graph Convolutional Networks (GCN). For the best model (Model C2), which was developed by SVM with RDKit descriptors, the coefficient of determination (R2 ) of 0.861 and root mean squared error (RMSE) of 0.235 were achieved for the test set. Additionally, R2 of 0.692 and RMSE of 0.383 were obtained on the external test set. We investigated the applicability domain (AD) of Model C2 with the rivality index (RI), the prediction of Model C2 on 78.92% of molecules in the test set, and 78.33% of molecules in the external test set were reliable. After dissecting the RDKit descriptors of Model C2, we found important physicochemical properties of highly active mPGES-1 inhibitors. Besides, by analyzing the attention weight of each atom of each inhibitor from the attention layer, we found that the benzamide group and the trifluoromethyl cyclohexane group are favorable substructures for mPGES-1 inhibitors.


Subject(s)
Algorithms , Quantitative Structure-Activity Relationship , Prostaglandin-E Synthases , Machine Learning , Support Vector Machine , Prostaglandins
4.
Article in English | MEDLINE | ID: mdl-34932483

ABSTRACT

Long non-coding RNAs (lncRNAs) play vital regulatory roles in many human complex diseases, however, the number of validated lncRNA-disease associations is notable rare so far. How to predict potential lncRNA-disease associations precisely through computational methods remains challenging. In this study, we proposed a novel method, LDVCHN (LncRNA-Disease Vector Calculation Heterogeneous Networks), and also developed the corresponding model, HEGANLDA (Heterogeneous Embedding Generative Adversarial Networks LncRNA-Disease Association), for predicting potential lncRNA-disease associations. In HEGANLDA, the graph embedding algorithm (HeGAN) was introduced for mapping all nodes in the lncRNA-miRNA-disease heterogeneous network into the low-dimensional vectors which severed as the inputs of LDVCHN. HEGANLDA effectively adopted the XGBoost (eXtreme Gradient Boosting) classifier, which was trained by the low-dimensional vectors, to predict potential lncRNA-disease associations. The 10-fold cross-validation method was utilized to evaluate the performance of our model, our model finally achieved an area under the ROC curve of 0.983. According to the experiment results, HEGANLDA outperformed any one of five current state-of-the-art methods. To further evaluate the effectiveness of HEGANLDA in predicting potential lncRNA-disease associations, both case studies and robustness tests were performed and the results confirmed its effectiveness and robustness. The source code and data of HEGANLDA are available at https://github.com/HEGANLDA/HEGANLDA.


Subject(s)
MicroRNAs , RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , Computational Biology/methods , Algorithms , Software
5.
Mol Divers ; 27(3): 1037-1051, 2023 Jun.
Article in English | MEDLINE | ID: mdl-35737257

ABSTRACT

Histone deacetylase (HDAC) 1, a member of the histone deacetylases family, plays a pivotal role in various tumors. In this study, we collected 7313 human HDAC1 inhibitors with bioactivities to form a dataset. Then, the dataset was divided into a training set and a test set using two splitting methods: (1) Kohonen's self-organizing map and (2) random splitting. The molecular structures were represented by MACCS fingerprints, RDKit fingerprints, topological torsions fingerprints and ECFP4 fingerprints. A total of 80 classification models were built by using five machine learning methods, including decision tree (DT), random forest, support vector machine, eXtreme Gradient Boosting and deep neural network. Model 15A_2 built by the XGBoost algorithm based on ECFP4 fingerprints showed the best performance, with an accuracy of 88.08% and an MCC value of 0.76 on the test set. Finally, we clustered the 7313 HDAC1 inhibitors into 31 subsets, and the substructural features in each subset were investigated. Moreover, using DT algorithm we analyzed the structure-activity relationship of HDAC1 inhibitors. It may conclude that some substructures have a significant effect on high activity, such as N-(2-amino-phenyl)-benzamide, benzimidazole, AR-42 analogues, hydroxamic acid with a middle chain alkyl and 4-aryl imidazole with a midchain of alkyl whose α carbon is chiral.


Subject(s)
Algorithms , Machine Learning , Humans , Structure-Activity Relationship , Molecular Structure , Support Vector Machine , Histone Deacetylase 1
6.
J Cheminform ; 14(1): 52, 2022 Aug 04.
Article in English | MEDLINE | ID: mdl-35927691

ABSTRACT

Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.

7.
Front Cell Dev Biol ; 9: 820342, 2021.
Article in English | MEDLINE | ID: mdl-35127729

ABSTRACT

Long non-coding RNAs (lncRNAs) do not encode proteins, yet they have been well established to be involved in complex regulatory functions, and lncRNA regulatory dysfunction can lead to a variety of human complex diseases. LncRNAs mostly exert their functions by regulating the expressions of target genes, and accurate prediction of potential lncRNA target genes would be helpful to further understanding the functional annotations of lncRNAs. Considering the limitations in traditional computational methods for predicting lncRNA target genes, a novel model which was named Weighted Average Fusion Network Representation learning for predicting LncRNA Target Genes (WAFNRLTG) was proposed. First, a novel heterogeneous network was constructed by integrating lncRNA sequence similarity network, mRNA sequence similarity network, lncRNA-mRNA interaction network, lncRNA-miRNA interaction network and mRNA-miRNA interaction network. Next, four popular network representation learning methods were utilized to gain the representation vectors of lncRNA and mRNA nodes. Then, the representations of lncRNAs and target genes in the heterogeneous network were obtained with the weighted average fusion network representation learning method. Finally, we merged the representations of lncRNAs and related target genes to form lncRNA-gene pairs, trained the XGBoost classifier and predicted potential lncRNA target genes. In five-cross validations on the training and independent datasets, the experimental results demonstrated that WAFNRLTG obtained better AUC scores (0.9410, 0.9350) and AUPR scores (0.9391, 0.9350). Moreover, case studies of three common lncRNAs were performed for predicting their potential lncRNA target genes and the results confirmed the effectiveness of WAFNRLTG. The source codes and all data of WAFNRLTG can be freely downloaded at https://github.com/HGDYZW/WAFNRLTG.

8.
Front Genet ; 12: 808962, 2021.
Article in English | MEDLINE | ID: mdl-35058974

ABSTRACT

Accumulated evidence of biological clinical trials has shown that long non-coding RNAs (lncRNAs) are closely related to the occurrence and development of various complex human diseases. Research works on lncRNA-disease relations will benefit to further understand the pathogenesis of human complex diseases at the molecular level, but only a small proportion of lncRNA-disease associations has been confirmed. Considering the high cost of biological experiments, exploring potential lncRNA-disease associations with computational approaches has become very urgent. In this study, a model based on closest node weight graph of the spatial neighborhood (CNWGSN) and edge attention graph convolutional network (EAGCN), LDA-EAGCN, was developed to uncover potential lncRNA-disease associations by integrating disease semantic similarity, lncRNA functional similarity, and known lncRNA-disease associations. Inspired by the great success of the EAGCN method on the chemical molecule property recognition problem, the prediction of lncRNA-disease associations could be regarded as a component recognition problem of lncRNA-disease characteristic graphs. The CNWGSN features of lncRNA-disease associations combined with known lncRNA-disease associations were introduced to train EAGCN, and correlation scores of input data were predicted with EAGCN for judging whether the input lncRNAs would be associated with the input diseases. LDA-EAGCN achieved a reliable AUC value of 0.9853 in the ten-fold cross-over experiments, which was the highest among five state-of-the-art models. Furthermore, case studies of renal cancer, laryngeal carcinoma, and liver cancer were implemented, and most of the top-ranking lncRNA-disease associations have been proven by recently published experimental literature works. It can be seen that LDA-EAGCN is an effective model for predicting potential lncRNA-disease associations. Its source code and experimental data are available at https://github.com/HGDKMF/LDA-EAGCN.

SELECTION OF CITATIONS
SEARCH DETAIL
...