Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 15(1): 4476, 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38796523

RESUMO

Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Algoritmos , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Aprendizado de Máquina , Interações Medicamentosas , Humanos , Ligação Proteica
2.
J Chem Inf Model ; 64(6): 1945-1954, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38484468

RESUMO

Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.


Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão
3.
Nat Commun ; 15(1): 1071, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38316797

RESUMO

While significant advances have been made in predicting static protein structures, the inherent dynamics of proteins, modulated by ligands, are crucial for understanding protein function and facilitating drug discovery. Traditional docking methods, frequently used in studying protein-ligand interactions, typically treat proteins as rigid. While molecular dynamics simulations can propose appropriate protein conformations, they're computationally demanding due to rare transitions between biologically relevant equilibrium states. In this study, we present DynamicBind, a deep learning method that employs equivariant geometric diffusion networks to construct a smooth energy landscape, promoting efficient transitions between different equilibrium states. DynamicBind accurately recovers ligand-specific conformations from unbound protein structures without the need for holo-structures or extensive sampling. Remarkably, it demonstrates state-of-the-art performance in docking and virtual screening benchmarks. Our experiments reveal that DynamicBind can accommodate a wide range of large protein conformational changes and identify cryptic pockets in unseen protein targets. As a result, DynamicBind shows potential in accelerating the development of small molecules for previously undruggable targets and expanding the horizons of computational drug discovery.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Ligantes , Proteínas/metabolismo , Conformação Proteica , Descoberta de Drogas , Ligação Proteica , Simulação de Acoplamento Molecular
4.
Patterns (N Y) ; 3(12): 100653, 2022 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-36569549

RESUMO

Jiahua Rao and Shuangjia Zheng are Ph.D. students in Prof. Yang's lab (Supercomputing And AI for Life science, SAIL Lab) at Sun Yat-sen University. They recently developed an interpretable framework to quantitatively assess the interpretability of Graph Neural Network (GNN) and made comparison with medicinal chemists. Their meaningful benchmarking and rigorous framework would greatly benefit development of new interpretable methods in GNNs.

5.
Patterns (N Y) ; 3(12): 100628, 2022 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-36569553

RESUMO

Graph neural networks (GNNs) have received increasing attention because of their expressive power on topological data, but they are still criticized for their lack of interpretability. To interpret GNN models, explainable artificial intelligence (XAI) methods have been developed. However, these methods are limited to qualitative analyses without quantitative assessments from the real-world datasets due to a lack of ground truths. In this study, we have established five XAI-specific molecular property benchmarks, including two synthetic and three experimental datasets. Through the datasets, we quantitatively assessed six XAI methods on four GNN models and made comparisons with seven medicinal chemists of different experience levels. The results demonstrated that XAI methods could deliver reliable and informative answers for medicinal chemists in identifying the key substructures. Moreover, the identified substructures were shown to complement existing classical fingerprints to improve molecular property predictions, and the improvements increased with the growth of training data.

6.
J Chem Inf Model ; 62(23): 5907-5917, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36404642

RESUMO

Fragment-based drug discovery is a widely used strategy for drug design in both academic and pharmaceutical industries. Although fragments can be linked to generate candidate compounds by the latest deep generative models, generating linkers with specified attributes remains underdeveloped. In this study, we presented a novel framework, DRlinker, to control fragment linking toward compounds with given attributes through reinforcement learning. The method has been shown to be effective for many tasks from controlling the linker length and log P, optimizing predicted bioactivity of compounds, to various multiobjective tasks. Specifically, our model successfully generated 91.0% and 93.9% of compounds complying with the desired linker length and log P and improved the 7.5 pChEMBL value in bioactivity optimization. Finally, a quasi-scaffold-hopping study revealed that DRlinker could generate nearly 30% molecules with high 3D similarity but low 2D similarity to the lead inhibitor, demonstrating the benefits and applicability of DRlinker in actual fragment-based drug design.


Assuntos
Desenho de Fármacos , Descoberta de Drogas
7.
Nat Commun ; 13(1): 3342, 2022 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-35688826

RESUMO

The complete biosynthetic pathways are unknown for most natural products (NPs), it is thus valuable to make computer-aided bio-retrosynthesis predictions. Here, a navigable and user-friendly toolkit, BioNavi-NP, is developed to predict the biosynthetic pathways for both NPs and NP-like compounds. First, a single-step bio-retrosynthesis prediction model is trained using both general organic and biosynthetic reactions through end-to-end transformer neural networks. Based on this model, plausible biosynthetic pathways can be efficiently sampled through an AND-OR tree-based planning algorithm from iterative multi-step bio-retrosynthetic routes. Extensive evaluations reveal that BioNavi-NP can identify biosynthetic pathways for 90.2% of 368 test compounds and recover the reported building blocks as in the test set for 72.8%, 1.7 times more accurate than existing conventional rule-based approaches. The model is further shown to identify biologically plausible pathways for complex NPs collected from the recent literature. The toolkit as well as the curated datasets and learned models are freely available to facilitate the elucidation and reconstruction of the biosynthetic pathways for NPs.


Assuntos
Produtos Biológicos , Aprendizado Profundo , Algoritmos , Vias Biossintéticas , Redes Neurais de Computação
8.
J Chem Inf Model ; 62(5): 1308-1317, 2022 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-35200015

RESUMO

Identifying drug-protein interactions (DPIs) is crucial in drug discovery, and a number of machine learning methods have been developed to predict DPIs. Existing methods usually use unrealistic data sets with hidden bias, which will limit the accuracy of virtual screening methods. Meanwhile, most DPI prediction methods pay more attention to molecular representation but lack effective research on protein representation and high-level associations between different instances. To this end, we present the novel structure-aware multimodal deep DPI prediction model, STAMP-DPI, which was trained on a curated industry-scale benchmark data set. We built a high-quality benchmark data set named GalaxyDB for DPI prediction. This industry-scale data set along with an unbiased training procedure resulted in a more robust benchmark study. For informative protein representation, we constructed a structure-aware graph neural network method from the protein sequence by combining predicted contact maps and graph neural networks. Through further integration of structure-based representation and high-level pretrained embeddings for molecules and proteins, our model effectively captures the feature representation of the interactions between them. As a result, STAMP-DPI outperformed state-of-the-art DPI prediction methods by decreasing 7.00% mean square error (MSE) in the Davis data set and improving 8.89% area under the curve (AUC) in the GalaxyDB data set. Moreover, our model is an interpretable model with the transformer-based interaction mechanism, which can accurately reveal the binding sites between molecules and proteins.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Aprendizado de Máquina , Redes Neurais de Computação , Proteínas/química
9.
Nat Biomed Eng ; 6(1): 76-93, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34992270

RESUMO

A reduced removal of dysfunctional mitochondria is common to aging and age-related neurodegenerative pathologies such as Alzheimer's disease (AD). Strategies for treating such impaired mitophagy would benefit from the identification of mitophagy modulators. Here we report the combined use of unsupervised machine learning (involving vector representations of molecular structures, pharmacophore fingerprinting and conformer fingerprinting) and a cross-species approach for the screening and experimental validation of new mitophagy-inducing compounds. From a library of naturally occurring compounds, the workflow allowed us to identify 18 small molecules, and among them two potent mitophagy inducers (Kaempferol and Rhapontigenin). In nematode and rodent models of AD, we show that both mitophagy inducers increased the survival and functionality of glutamatergic and cholinergic neurons, abrogated amyloid-ß and tau pathologies, and improved the animals' memory. Our findings suggest the existence of a conserved mechanism of memory loss across the AD models, this mechanism being mediated by defective mitophagy. The computational-experimental screening and validation workflow might help uncover potent mitophagy modulators that stimulate neuronal health and brain homeostasis.


Assuntos
Doença de Alzheimer , Mitofagia , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/patologia , Peptídeos beta-Amiloides , Animais , Aprendizado de Máquina , Mitofagia/fisiologia , Fluxo de Trabalho
10.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35039821

RESUMO

Protein-DNA interactions play crucial roles in the biological systems, and identifying protein-DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information. Based on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformer-based variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm is further confirmed on the independent test set of 181 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 16.4% in area under the precision-recall curve and 11.2% in Matthews correlation coefficient, respectively. We provide the datasets, the predicted structures and the source codes along with the pre-trained models of GraphSite at https://github.com/biomed-AI/GraphSite. The GraphSite web server is freely available at https://biomed.nscc-gz.cn/apps/GraphSite.


Assuntos
Algoritmos , Proteínas , Sítios de Ligação , DNA/metabolismo , Ligação Proteica , Domínios Proteicos , Proteínas/química
11.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3735-3743, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34637380

RESUMO

MOTIVATION: The interactions of proteins with DNA, RNA, peptide, and carbohydrate play key roles in various biological processes. The studies of uncharacterized protein-molecules interactions could be aided by accurate predictions of residues that bind with partner molecules. However, the existing methods for predicting binding residues on proteins remain of relatively low accuracies due to the limited number of complex structures in databases. As different types of molecules partially share chemical mechanisms, the predictions for each molecular type should benefit from the binding information with other molecule types. RESULTS: In this study, we employed a multiple task deep learning strategy to develop a new sequence-based method for simultaneously predicting binding residues/sites with multiple important molecule types named MTDsite. By combining four training sets for DNA, RNA, peptide, and carbohydrate-binding proteins, our method yielded accurate and robust predictions with AUC values of 0.852, 0836, 0.758, and 0.776 on their respective independent test sets, which are 0.52 to 6.6% better than other state-of-the-art methods. To my best knowledge, this is the first method using multi-task framework to predict multiple molecular binding sites simultaneously.


Assuntos
Peptídeos , RNA , RNA/química , Peptídeos/química , Redes Neurais de Computação , Proteínas/química , Sítios de Ligação , Carboidratos , DNA/genética , DNA/metabolismo , Ligação Proteica
12.
J Cheminform ; 13(1): 87, 2021 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-34774103

RESUMO

Scaffold hopping is a central task of modern medicinal chemistry for rational drug design, which aims to design molecules of novel scaffolds sharing similar target biological activities toward known hit molecules. Traditionally, scaffolding hopping depends on searching databases of available compounds that can't exploit vast chemical space. In this study, we have re-formulated this task as a supervised molecule-to-molecule translation to generate hopped molecules novel in 2D structure but similar in 3D structure, as inspired by the fact that candidate compounds bind with their targets through 3D conformations. To efficiently train the model, we curated over 50 thousand pairs of molecules with increased bioactivity, similar 3D structure, but different 2D structure from public bioactivity database, which spanned 40 kinases commonly investigated by medicinal chemists. Moreover, we have designed a multimodal molecular transformer architecture by integrating molecular 3D conformer through a spatial graph neural network and protein sequence information through Transformer. The trained DeepHop model was shown able to generate around 70% molecules having improved bioactivity together with high 3D similarity but low 2D scaffold similarity to the template molecules. This ratio was 1.9 times higher than other state-of-the-art deep learning methods and rule- and virtual screening-based methods. Furthermore, we demonstrated that the model could generalize to new target proteins through fine-tuning with a small set of active compounds. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenarios.

13.
J Chem Inf Model ; 61(10): 4900-4912, 2021 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-34586824

RESUMO

The protein kinase family contains many promising drug targets. Many kinase inhibitors target the ATP-binding pocket, leading to approved drugs in past decades. Scaffold hopping is an effective approach for drug design. The kinase ATP-binding pocket is highly conserved, crossing the whole kinase family. This provides an opportunity to develop a scaffold hopping approach to explore diversified scaffolds among various kinase inhibitors. In this work, we report the SyntaLinker-Hybrid scheme for kinase inhibitor scaffold hopping. With this scheme, we replace molecular fragments bound at the conserved kinase hinge region with deep generative models. Thus, we are able to generate new kinase-inhibitor-like structures hybridizing the privileged fragments against the hinge region. We demonstrate that this scheme allows generation of kinase-inhibitor-like molecules with novel scaffolds, while retaining the binding features of existing kinase inhibitors. This work can be employed in lead identification against kinase targets.


Assuntos
Aprendizado Profundo , Desenho de Fármacos , Ligação Proteica , Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases
14.
Bioinformatics ; 38(1): 94-98, 2021 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-34450651

RESUMO

MOTIVATION: The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS: In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION: The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Proteínas de Membrana , Solventes/química
15.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2775-2780, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33705321

RESUMO

A novel coronavirus (COVID-19) recently emerged as an acute respiratory syndrome, and has caused a pneumonia outbreak world-widely. As the COVID-19 continues to spread rapidly across the world, computed tomography (CT) has become essentially important for fast diagnoses. Thus, it is urgent to develop an accurate computer-aided method to assist clinicians to identify COVID-19-infected patients by CT images. Here, we have collected chest CT scans of 88 patients diagnosed with COVID-19 from hospitals of two provinces in China, 100 patients infected with bacteria pneumonia, and 86 healthy persons for comparison and modeling. Based on the data, a deep learning-based CT diagnosis system was developed to identify patients with COVID-19. The experimental results showed that our model could accurately discriminate the COVID-19 patients from the bacteria pneumonia patients with an AUC of 0.95, recall (sensitivity) of 0.96, and precision of 0.79. When integrating three types of CT images, our model achieved a recall of 0.93 with precision of 0.86 for discriminating COVID-19 patients from others. Moreover, our model could extract main lesion features, especially the ground-glass opacity (GGO), which are visually helpful for assisted diagnoses by doctors. An online server is available for online diagnoses with CT images by our server (http://biomed.nscc-gz.cn/model.php). Source codes and datasets are available at our GitHub (https://github.com/SY575/COVID19-CT).


Assuntos
COVID-19/diagnóstico por imagem , COVID-19/diagnóstico , Aprendizado Profundo , Diagnóstico por Computador/estatística & dados numéricos , Tomografia Computadorizada por Raios X/estatística & dados numéricos , Estudos de Casos e Controles , China , Biologia Computacional , Diagnóstico Diferencial , Humanos , Modelos Estatísticos , Pneumonia Bacteriana/diagnóstico , Pneumonia Bacteriana/diagnóstico por imagem , SARS-CoV-2
16.
J Chem Inf Model ; 61(4): 1627-1636, 2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33729779

RESUMO

The goal of molecular optimization (MO) is to discover molecules that acquire improved pharmaceutical properties over a known starting molecule. Despite many recent successes of new approaches for MO, these methods were typically developed for particular properties with rich annotated training examples. Thus, these approaches are difficult to implement in real scenes where only a small amount of pharmaceutical data is usually available due to the expense and significant effort required for the data collection. Here, we propose a new approach, Meta-MO, for molecular optimization with a handful of training samples based on the well-recognized first-order meta-learning algorithms. By using a set of meta tasks with rich training samples, Meta-MO trains a meta model through the meta-learning optimization and adapts the learned model to new low-resource MO tasks. Meta-MO was shown to consistently outperform several pretraining and multitask training procedures, providing an average improvement in the success rate of 4.3% on a large-scale bioactivity data set with diverse target variations. We also observed that Meta-MO resulted in the best performing models across fine-tuning sets with only dozens of samples. To the best of our knowledge, this is the first study to apply meta learning to MO tasks. More importantly, such a strategy could be further extended to many low-resource scenarios in real-world drug design.


Assuntos
Algoritmos
17.
J Cheminform ; 13(1): 7, 2021 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-33557952

RESUMO

Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.

18.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33341877

RESUMO

Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.


Assuntos
Pesquisa Biomédica , Mineração de Dados , Redes Neurais de Computação , Semântica , Software , Benchmarking , Humanos
19.
Eur J Med Chem ; 210: 112982, 2021 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-33158578

RESUMO

A pre-trained self-attentive message passing neural network (P-SAMPNN) model was developed based on our anti-osteoclastogenesis dataset for virtual screening purpose. Validation processes proved that P-SAMPNN model was significantly superior to the other base line models. A commercially available natural product library was virtually screened by the P-SAMPNN model and resulted in confirmed 5 hits from 10 selected virtual hits. Among the confirmed hits, compounds AP-123/40765213 and AE-562/43462182 are the nanomolar inhibitors against osteoclastogenesis with a new scaffold. Further studies indicate that AP-123/40765213 and AE-562/43462182 significantly suppress the mRNA expression of RANK and downregulate the expressions of osteoclasts-related genes Ctsk, Nfatc1, and Tracp. Our work demonstrated that P-SAMPNN method can guide phenotype-based drug discovery.


Assuntos
Produtos Biológicos/farmacologia , Descoberta de Drogas , Osteoporose/tratamento farmacológico , Animais , Produtos Biológicos/síntese química , Produtos Biológicos/química , Sobrevivência Celular/efeitos dos fármacos , Células Cultivadas , Relação Dose-Resposta a Droga , Camundongos , Camundongos Endogâmicos C57BL , Estrutura Molecular , Osteogênese/efeitos dos fármacos , Relação Estrutura-Atividade
20.
J Chem Inf Model ; 60(3): 1165-1174, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-32013419

RESUMO

The copper(I)-catalyzed alkyne-azide cycloaddition (CuAAC) reaction, a major click chemistry reaction, is widely employed in drug discovery and chemical biology. However, the success rate of the CuAAC reaction is not satisfactory as expected, and in order to improve its performance, we developed a recurrent neural network (RNN) model to predict its feasibility. First, we designed and synthesized a structurally diverse library of 700 compounds with the CuAAC reaction to obtain experimental data. Then, using reaction SMILES as input, we generated a bidirectional long-short-term memory with a self-attention mechanism (BiLSTM-SA) model. Our best prediction model has total accuracy of 80%. With the self-attention mechanism, adverse substructures responsible for negative reactions were recognized and derived as quantitative descriptors. Density functional theory investigations were conducted to provide evidence for the correlation between bromo-α-C hybrid types and the success rate of the reaction. Quantitative descriptors combined with RDKit descriptors were fed to three machine learning models, a support vector machine, random forest, and logistic regression, and resulted in improved performance. The BiLSTM-SA model for predicting the feasibility of the CuAAC reaction is superior to other conventional learning methods and advances heuristic chemical rules.


Assuntos
Alcinos , Azidas , Catálise , Química Click , Cobre , Reação de Cicloadição , Estudos de Viabilidade , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...