Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38990514

RESUMO

Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.


Assuntos
Peptídeos , Sítios de Ligação , Peptídeos/química , Peptídeos/metabolismo , Ligação Proteica , Biologia Computacional/métodos , Algoritmos , Proteínas/química , Proteínas/metabolismo , Aprendizado de Máquina
2.
Eur J Med Chem ; 275: 116628, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-38944933

RESUMO

Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.


Assuntos
Aprendizado Profundo , Peptídeos Cíclicos/química , Peptídeos Cíclicos/farmacologia , Peptídeos Cíclicos/síntese química , Compostos Macrocíclicos/química , Compostos Macrocíclicos/farmacologia , Compostos Macrocíclicos/síntese química , Estrutura Molecular , Humanos , Peptídeos/química , Peptídeos/farmacologia , Relação Estrutura-Atividade , Relação Dose-Resposta a Droga
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38706323

RESUMO

In recent years, cyclic peptides have emerged as a promising therapeutic modality due to their diverse biological activities. Understanding the structures of these cyclic peptides and their complexes is crucial for unlocking invaluable insights about protein target-cyclic peptide interaction, which can facilitate the development of novel-related drugs. However, conducting experimental observations is time-consuming and expensive. Computer-aided drug design methods are not practical enough in real-world applications. To tackles this challenge, we introduce HighFold, an AlphaFold-derived model in this study. By integrating specific details about the head-to-tail circle and disulfide bridge structures, the HighFold model can accurately predict the structures of cyclic peptides and their complexes. Our model demonstrates superior predictive performance compared to other existing approaches, representing a significant advancement in structure-activity research. The HighFold model is openly accessible at https://github.com/hongliangduan/HighFold.


Assuntos
Dissulfetos , Peptídeos Cíclicos , Peptídeos Cíclicos/química , Dissulfetos/química , Software , Modelos Moleculares , Conformação Proteica , Algoritmos , Biologia Computacional/métodos
4.
Methods ; 228: 38-47, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38772499

RESUMO

Human leukocyte antigen (HLA) molecules play critically significant role within the realm of immunotherapy due to their capacities to recognize and bind exogenous antigens such as peptides, subsequently delivering them to immune cells. Predicting the binding between peptides and HLA molecules (pHLA) can expedite the screening of immunogenic peptides and facilitate vaccine design. However, traditional experimental methods are time-consuming and inefficient. In this study, an efficient method based on deep learning was developed for predicting peptide-HLA binding, which treated peptide sequences as linguistic entities. It combined the architectures of textCNN and BiLSTM to create a deep neural network model called APEX-pHLA. This model operated without limitations related to HLA class I allele variants and peptide segment lengths, enabling efficient encoding of sequence features for both HLA and peptide segments. On the independent test set, the model achieved Accuracy, ROC_AUC, F1, and MCC is 0.9449, 0.9850, 0.9453, and 0.8899, respectively. Similarly, on an external test set, the results were 0.9803, 0.9574, 0.8835, and 0.7863, respectively. These findings outperformed fifteen methods previously reported in the literature. The accurate prediction capability of the APEX-pHLA model in peptide-HLA binding might provide valuable insights for future HLA vaccine design.


Assuntos
Antígenos de Histocompatibilidade Classe I , Peptídeos , Ligação Proteica , Humanos , Antígenos de Histocompatibilidade Classe I/imunologia , Antígenos de Histocompatibilidade Classe I/metabolismo , Peptídeos/química , Peptídeos/imunologia , Aprendizado Profundo , Antígenos HLA/imunologia , Antígenos HLA/genética , Redes Neurais de Computação , Biologia Computacional/métodos
5.
Methods ; 228: 22-29, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38754712

RESUMO

Drug-drug interaction (DDI) prediction is crucial for identifying interactions within drug combinations, especially adverse effects due to physicochemical incompatibility. While current methods have made strides in predicting adverse drug interactions, limitations persist. Most methods rely on handcrafted features, restricting their applicability. They predominantly extract information from individual drugs, neglecting the importance of interaction details between drug pairs. To address these issues, we propose MGDDI, a graph neural network-based model for predicting potential adverse drug interactions. Notably, we use a multiscale graph neural network (MGNN) to learn drug molecule representations, addressing substructure size variations and preventing gradient issues. For capturing interaction details between drug pairs, we integrate a substructure interaction learning module based on attention mechanisms. Our experimental results demonstrate MGDDI's superiority in predicting adverse drug interactions, offering a solution to current methodological limitations.


Assuntos
Interações Medicamentosas , Redes Neurais de Computação , Humanos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Algoritmos
6.
Eur J Med Chem ; 268: 116262, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38387334

RESUMO

Peptides can bind challenging disease targets with high affinity and specificity, offering enormous opportunities for addressing unmet medical needs. However, peptides' unique features, including smaller size, increased structural flexibility, and limited data availability, pose additional challenges to the design process compared to proteins. This review explores the dynamic field of peptide therapeutics, leveraging deep learning to enhance structure prediction and design. Our exploration encompasses various facets of peptide research, ranging from dataset curation handling to model development. As deep learning technologies become more refined, we channel our efforts into peptide structure prediction and design, aligning with the fundamental principles of structure-activity relationships in drug development. To guide researchers in harnessing the potential of deep learning to advance peptide drug development, our insights comprehensively explore current challenges and future directions of peptide therapeutics.


Assuntos
Aprendizado Profundo , Peptídeos/farmacologia , Desenvolvimento de Medicamentos , Relação Estrutura-Atividade , Tecnologia
7.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38305428

RESUMO

MOTIVATION: 5-Methylcytosine (5mC), a fundamental element of DNA methylation in eukaryotes, plays a vital role in gene expression regulation, embryonic development, and other biological processes. Although several computational methods have been proposed for detecting the base modifications in DNA like 5mC sites from Nanopore sequencing data, they face challenges including sensitivity to noise, and ignoring the imbalanced distribution of methylation sites in real-world scenarios. RESULTS: Here, we develop NanoCon, a deep hybrid network coupled with contrastive learning strategy to detect 5mC methylation sites from Nanopore reads. In particular, we adopted a contrastive learning module to alleviate the issues caused by imbalanced data distribution in nanopore sequencing, offering a more accurate and robust detection of 5mC sites. Evaluation results demonstrate that NanoCon outperforms existing methods, highlighting its potential as a valuable tool in genomic sequencing and methylation prediction. In addition, we also verified the effectiveness of our representation learning ability on two datasets by visualizing the dimension reduction of the features of methylation and nonmethylation sites from our NanoCon. Furthermore, cross-species and cross-5mC methylation motifs experiments indicated the robustness and the ability to perform transfer learning of our model. We hope this work can contribute to the community by providing a powerful and reliable solution for 5mC site detection in genomic studies. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/Challis-yin/NanoCon.


Assuntos
Nanoporos , Metilação de DNA , Genômica , Genoma , DNA
8.
J Med Chem ; 67(3): 1888-1899, 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38270541

RESUMO

Cyclic peptides are gaining attention for their strong binding affinity, low toxicity, and ability to target "undruggable" proteins; however, their therapeutic potential against intracellular targets is constrained by their limited membrane permeability, and researchers need much time and money to test this property in the laboratory. Herein, we propose an innovative multimodal model called Multi_CycGT, which combines a graph convolutional network (GCN) and a transformer to extract one- and two-dimensional features for predicting cyclic peptide permeability. The extensive benchmarking experiments show that our Multi_CycGT model can attain state-of-the-art performance, with an average accuracy of 0.8206 and an area under the curve of 0.8650, and demonstrates satisfactory generalization ability on several external data sets. To the best of our knowledge, it is the first deep learning-based attempt to predict the membrane permeability of cyclic peptides, which is beneficial in accelerating the design of cyclic peptide active drugs in medicinal chemistry and chemical biology applications.


Assuntos
Aprendizado Profundo , Permeabilidade da Membrana Celular , Química Farmacêutica , Peptídeos Cíclicos/farmacologia , Permeabilidade
9.
J Fluoresc ; 34(1): 179-190, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37166611

RESUMO

Simple and sensitive detection of cardiac biomarkers is of great significance for early diagnosis and prevention of acute myocardial infarction (AMI). Here, a ratiometric fluorescent nanohybrids probe (AuNCs-QDs) was synthesized through the coupling of bovine serum albumin-functionalized gold nanoclusters (AuNCs) with CdSe/ZnS quantum dots (QDs) to realize simple and sensitive detection of cardiac biomarker myoglobin (Mb). The AuNCs-QDs probe shows pink fluorescence under UV light, with two emission peaks at 468 nm and 630 nm belonging to QDs and AuNCs, respectively. Importantly, the presence of Mb caused fluorescence quenching of the blue-emitting QDs, thereby inhibiting the fluorescence resonance energy transfer (FRET) process between QDs and AuNCs, and reducing the fluorescence intensity ratio (F468/F630) of AuNCs-QDs probe effectively. As the concentration of Mb increases, the ratiometric fluorescent probe also exhibits a visible fluorescence color change. The detection limit was as low as 4.99 µg/mL, and the response of the probe to Mb showed a good linear relationship up to 0.52 mg/mL. Moreover, the probe has excellent specificity for Mb. Besides, the AuNCs-QDs has been applied to detect Mb of urine samples. More importantly, we also developed an AuNCs-QDs probe modified smartphone-aided paper-based strip for on-site monitoring of Mb. As far as we know, this is the first report of a smartphone-aided paper-based strip for on-site quick monitoring of Mb, which provides a useful approach for AMI biomarker monitoring and may can be extended to other medical diagnostics.


Assuntos
Nanopartículas Metálicas , Pontos Quânticos , Mioglobina , Smartphone , Espectrometria de Fluorescência , Corantes Fluorescentes , Ouro , Biomarcadores
10.
J Chem Inf Model ; 63(24): 7655-7668, 2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38049371

RESUMO

The development of potentially active peptides for specific targets is critical for the modern pharmaceutical industry's growth. In this study, we present an efficient computational framework for the discovery of active peptides targeting a specific pharmacological target, which combines a conditional variational autoencoder (CVAE) and a classifier named TCPP based on the Transformer and convolutional neural network. In our example scenario, we constructed an active cyclic peptide library targeting interleukin-17C (IL-17C) through a library-based in vitro selection strategy. The CVAE model is trained on the preprocessed peptide data sets to generate potentially active peptides and the TCPP further screens the generated peptides. Ultimately, six candidate peptides predicted by the model were synthesized and assayed for their activity, and four of them exhibited promising binding affinity to IL-17C. Our study provides a one-stop-shop for target-specific active peptide discovery, which is expected to boost up the process of peptide drug development.


Assuntos
Interleucina-17 , Peptídeos Cíclicos , Peptídeos Cíclicos/farmacologia , Interleucina-17/metabolismo , Peptídeos
11.
Nanomedicine (Lond) ; 18(19): 1281-1303, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37753724

RESUMO

Nanotechnology has significant potential for cancer management at all stages, including prevention, diagnosis and treatment. In therapeutic applications, nanoparticles (NPs) have biological stability, targeting and body-clearance issues. To overcome these difficulties, biomimetic or cell membrane-coating methods using immune cell membranes are advised. Macrophage or neutrophil cell membrane-coated NPs may impede cancer progression in malignant tissue. Immune cell surface proteins and their capacity to maintain activity after membrane extraction and NP coating determine NP functioning. Immune cell surface proteins may offer NPs higher cellular interactions, blood circulation, antigen recognition for targeting, progressive drug release and reduced in vivo toxicity. This article examines nano-based systems with immune cell membranes, their surface modification potential, and their application in cancer treatment.


Nanoparticles (NPs) are small particles that range between 1 and 100 nanometres in size that are used to deliver substances that aid in the prevention, diagnosis and treatment of cancer. NPs are promising for therapeutic use but face challenges like stability, cancer targeting and clearance in the body. This article suggests that these challenges can be overcome using biomimetic methods. This involves coating NPs in cell membranes from immune cells. This has been demonstrated using two types of white blood cells, called macrophages and neutrophils. NPs coated in membranes derived from these cells have been shown to hinder cancer progression. How effective these coated NP cells are depends on what proteins from the surface of the immune cells are included and whether they remain active. These immune cell surface proteins allow coated NPs to have improved interactions with cells, circulate in the blood for longer, target proteins overexpressed on cancer cells and release drugs gradually. Biomimentic cell membrane coating also decreases cell membrane toxicity. The article examines NP-based systems using immune cell membranes, their potential for surface modification and their application in cancer treatment.


Assuntos
Nanopartículas , Neoplasias , Humanos , Membrana Celular , Neoplasias/tratamento farmacológico , Proteínas de Membrana
12.
Biomed Pharmacother ; 165: 115276, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37542852

RESUMO

Short-chain fatty acids (SCFAs) derived from the fermentation of carbohydrates by gut microbiota play a crucial role in regulating host physiology. Among them, acetate, propionate, and butyrate are key players in various biological processes. Recent research has revealed their significant functions in immune and inflammatory responses. For instance, butyrate reduces the development of interferon-gamma (IFN-γ) generating cells while promoting the development of regulatory T (Treg) cells. Propionate inhibits the initiation of a Th2 immune response by dendritic cells (DCs). Notably, SCFAs have an inhibitory impact on the polarization of M2 macrophages, emphasizing their immunomodulatory properties and potential for therapeutics. In animal models of asthma, both butyrate and propionate suppress the M2 polarization pathway, thus reducing allergic airway inflammation. Moreover, dysbiosis of gut microbiota leading to altered SCFA production has been implicated in prostate cancer progression. SCFAs trigger autophagy in cancer cells and promote M2 polarization in macrophages, accelerating tumor advancement. Manipulating microbiota- producing SCFAs holds promise for cancer treatment. Additionally, SCFAs enhance the expression of hypoxia-inducible factor 1 (HIF-1) by blocking histone deacetylase, resulting in increased production of antibacterial effectors and improved macrophage-mediated elimination of microorganisms. This highlights the antimicrobial potential of SCFAs and their role in host defense mechanisms. This comprehensive review provides an in-depth analysis of the latest research on the functional aspects and underlying mechanisms of SCFAs in relation to macrophage activities in a wide range of diseases, including infectious diseases and cancers. By elucidating the intricate interplay between SCFAs and macrophage functions, this review aims to contribute to the understanding of their therapeutic potential and pave the way for future interventions targeting SCFAs in disease management.


Assuntos
Microbioma Gastrointestinal , Propionatos , Masculino , Animais , Propionatos/uso terapêutico , Ácidos Graxos Voláteis/metabolismo , Butiratos/farmacologia , Butiratos/uso terapêutico , Inflamação/tratamento farmacológico , Microbioma Gastrointestinal/fisiologia , Macrófagos/metabolismo
13.
J Med Chem ; 66(16): 11187-11200, 2023 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-37480587

RESUMO

The combination of library-based screening and artificial intelligence (AI) has been accelerating the discovery and optimization of hit ligands. However, the potential of AI to assist in de novo macrocyclic peptide ligand discovery has yet to be fully explored. In this study, an integrated AI framework called PepScaf was developed to extract the critical scaffold relative to bioactivity based on a vast dataset from an initial in vitro selection campaign against a model protein target, interleukin-17C (IL-17C). Taking the generated scaffold, a focused macrocyclic peptide library was rationally constructed to target IL-17C, yielding over 20 potent peptides that effectively inhibited IL-17C/IL-17RE interaction. Notably, the top two peptides displayed exceptional potency with IC50 values of 1.4 nM. This approach presents a viable methodology for more efficient macrocyclic peptide discovery, offering potential time and cost savings. Additionally, this is also the first report regarding the discovery of macrocyclic peptides against IL-17C/IL-17RE interaction.


Assuntos
Inteligência Artificial , Interleucina-17 , Aprendizado de Máquina , Peptídeos , Biblioteca de Peptídeos
14.
Am J Transl Res ; 15(4): 2783-2792, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37193137

RESUMO

OBJECTIVE: To construct a predictive model for 3-year survival of patients after curative resection of colon cancer by nomogram. METHOD: A retrospective analysis was conducted to analyze the clinicopathologic data of 102 patients who underwent radical resection of colon cancer in Baoji Central Hospital from April 2015 to April 2017. The optimal cutoff values of preoperative CEA, CA125, and NLR for predicting overall survival were analyzed by receiver operating characteristic (ROC) curves. To observe the relationship between NLR, CEA and CA125 and clinicopathologic data, we performed multivariate Cox regression to analyze the independent factors affecting the prognosis of patients, and Kaplan-Meier test to identify the relationship between NLR, CEA and CA125 and patient survival. A nomogram prediction model was drawn for patients' 1-, 2-, and 3-year survival after radical resection of colon cancer, and the efficacy of the prediction model was evaluated. RESULTS: The area under the curve (AUC) of NLR, CEA and CA125 in predicting patient death was 0.784, 0.790 and 0.771, respectively. NLR was correlated with clinical stage, tumor diameter and differentiation (all P < 0.05); CEA was associated with clinical stage, tumor diameter, differentiation and lymph node metastasis (all P < 0.05); CA125 was only associated with tumor diameter in patients (P < 0.05). Differentiation, NLR, CEA and CA125 were independent risk factors affecting the prognosis of patients (all P < 0.05). The nomogram predicted a model C-index of 0.918 (95% CI 0.885-0.952), and the risk model score was found to have a high clinical value in the 3-year survival of preexisting patients. CONCLUSION: Preoperative NLR, CEA, CA125 and clinical stage are correlated with the prognosis of patients with colon cancer. The nomogram model constructed based on NLR, CEA, CA125 and clinical stage has good accuracy.

15.
RSC Adv ; 12(52): 33801-33807, 2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36505715

RESUMO

Deep learning has enormous potential in the chemical and pharmaceutical fields, and generative adversarial networks (GANs) in particular have exhibited remarkable performance in the field of molecular generation as generative models. However, their application in the field of organic chemistry has been limited; thus, in this study, we attempt to utilize a GAN as a generative model for the generation of Diels-Alder reactions. A MaskGAN model was trained with 14 092 Diels-Alder reactions, and 1441 novel Diels-Alder reactions were generated. Analysis of the generated reactions indicated that the model learned several reaction rules in-depth. Thus, the MaskGAN model can be used to generate organic reactions and aid chemists in the exploration of novel reactions.

16.
RSC Adv ; 12(49): 32020-32026, 2022 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-36380947

RESUMO

Recently, effective and rapid deep-learning methods for predicting chemical reactions have significantly aided the research and development of organic chemistry and drug discovery. Owing to the insufficiency of related chemical reaction data, computer-assisted predictions based on low-resource chemical datasets generally have low accuracy despite the exceptional ability of deep learning in retrosynthesis and synthesis. To address this issue, we introduce two types of multitask models: retro-forward reaction prediction transformer (RFRPT) and multiforward reaction prediction transformer (MFRPT). These models integrate multitask learning with the transformer model to predict low-resource reactions in forward reaction prediction and retrosynthesis. Our results demonstrate that introducing multitask learning significantly improves the average top-1 accuracy, and the RFRPT (76.9%) and MFRPT (79.8%) outperform the transformer baseline model (69.9%). These results also demonstrate that a multitask framework can capture sufficient chemical knowledge and effectively mitigate the impact of the deficiency of low-resource data in processing reaction prediction tasks. Both RFRPT and MFRPT methods significantly improve the predictive performance of transformer models, which are powerful methods for eliminating the restriction of limited training data.

17.
Sci Rep ; 12(1): 17098, 2022 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-36224331

RESUMO

To improve the performance of data-driven reaction prediction models, we propose an intelligent strategy for predicting reaction products using available data and increasing the sample size using fake data augmentation. In this research, fake data sets were created and augmented with raw data for constructing virtual training models. Fake reaction datasets were created by replacing some functional groups, i.e., in the data analysis strategy, the fake data as compounds with modified functional groups to increase the amount of data for reaction prediction. This approach was tested on five different reactions, and the results show improvements over other relevant techniques with increased model predictivity. Furthermore, we evaluated this method in different models, confirming the generality of virtual data augmentation. In summary, virtual data augmentation can be used as an effective measure to solve the problem of insufficient data and significantly improve the performance of reaction prediction.


Assuntos
Projetos de Pesquisa
18.
J Cheminform ; 14(1): 60, 2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36056425

RESUMO

Deep learning methods, such as reaction prediction and retrosynthesis analysis, have demonstrated their significance in the chemical field. However, the de novo generation of novel reactions using artificial intelligence technology requires further exploration. Inspired by molecular generation, we proposed a novel task of reaction generation. Herein, Heck reactions were applied to train the transformer model, a state-of-art natural language process model, to generate 4717 reactions after sampling and processing. Then, 2253 novel Heck reactions were confirmed by organizing chemists to judge the generated reactions. More importantly, further organic synthesis experiments were performed to verify the accuracy and feasibility of representative reactions. The total process, from Heck reaction generation to experimental verification, required only 15 days, demonstrating that our model has well-learned reaction rules in-depth and can contribute to novel reaction discovery and chemical space exploration.

19.
J Chem Inf Model ; 62(19): 4579-4590, 2022 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-36129104

RESUMO

In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show that the average improved accuracies of the Transformer model can reach 14.07, 24.26, 40.31, and 57.69% in predicting the Baeyer-Villiger, Heck, C-C bond formation, and functional group interconversion reaction data sets, respectively, marking an important step to low-resource reaction prediction.

20.
Phys Chem Chem Phys ; 24(17): 10280-10291, 2022 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-35437562

RESUMO

While state-of-art models can predict reactions through the transfer learning of thousands of samples with the same reaction types as those of the reactions to predict, how to prepare such models to predict "unseen" reactions remains an unanswered question. We aimed to study the Transformer model's ability to predict "unseen" reactions through "zero-shot reaction prediction (ZSRP)", a concept derived from zero-shot learning and zero-shot translation. We reproduced the human invention of the Chan-Lam coupling reaction where the inventor was inspired by the Suzuki reaction when improving Barton's bismuth arylation reaction. After being fine-tuned with samples from these two "existing" reactions, the USPTO-trained Transformer could predict "unseen" Chan-Lam coupling reactions with 55.7% top-1 accuracy. Our model could also mimic the later stage of the history of this reaction, where the initial case of this reaction was generalized to more reactants and reagents via "one-shot/few-shot reaction prediction (OSRP/FSRP)" approaches.


Assuntos
Invenções , Aprendizado de Máquina , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...