Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ACS Omega ; 9(24): 26213-26221, 2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38911735

RESUMO

Accurate and rapid evaluation of density is crucial for evaluating the packing and combustion characteristics of high-energy-density fuels (HEDFs). This parameter is pivotal in the selection of high-performance HEDFs. Our study leveraged a polycyclic compound density data set and quantum chemical (QC) descriptors to establish a correlation with the target properties using the XGBoost algorithm. We utilized a recursive feature elimination method to simplify the model and developed a concise and interpretable density prediction model incorporating only six QC descriptors. The model demonstrated robust performance, achieving coefficients of determination (R 2) of 0.967 and 0.971 for internal and external test sets, respectively, and root-mean-square errors (RMSE) of 0.031 and 0.027 g/cm3, respectively. Compared to the other two mainstream methods, the marginal discrepancy between the predicted and actual molecular densities underscores the model's superior predictive ability and more usefulness for energy density calculation. Furthermore, we developed a web server (SesquiterPre, https://sespre.cmdrg.com/#/) that can simultaneously calculate the density, enthalpy of combustion, and energy density of sesquiterpenoid HEDFs, which greatly facilitates the use of researchers and is of great significance for accelerating the design and screening of novel sesquiterpenoid HEDFs.

2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38385872

RESUMO

Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.


Assuntos
Aprendizado Profundo , Humanos , Desenvolvimento de Medicamentos , Descoberta de Drogas , Inibidores de Poli(ADP-Ribose) Polimerases
3.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34427296

RESUMO

Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches.


Assuntos
Quimioinformática/métodos , Aprendizado Profundo , Descoberta de Drogas/métodos , Software , Algoritmos , Desenvolvimento de Medicamentos , Projetos de Pesquisa
4.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33940596

RESUMO

The poly (ADP-ribose) polymerase-1 (PARP1) has been regarded as a vital target in recent years and PARP1 inhibitors can be used for ovarian and breast cancer therapies. However, it has been realized that most of PARP1 inhibitors have disadvantages of low solubility and permeability. Therefore, by discovering more molecules with novel frameworks, it would have greater opportunities to apply it into broader clinical fields and have a more profound significance. In the present study, multiple virtual screening (VS) methods had been employed to evaluate the screening efficiency of ligand-based, structure-based and data fusion methods on PARP1 target. The VS methods include 2D similarity screening, structure-activity relationship (SAR) models, docking and complex-based pharmacophore screening. Moreover, the sum rank, sum score and reciprocal rank were also adopted for data fusion methods. The evaluation results show that the similarity searching based on Torsion fingerprint, six SAR models, Glide docking and pharmacophore screening using Phase have excellent screening performance. The best data fusion method is the reciprocal rank, but the sum score also performs well in framework enrichment. In general, the ligand-based VS methods show better performance on PARP1 inhibitor screening. These findings confirmed that adding ligand-based methods to the early screening stage will greatly improve the screening efficiency, and be able to enrich more highly active PARP1 inhibitors with diverse structures.


Assuntos
Bases de Dados de Compostos Químicos , Simulação de Acoplamento Molecular , Poli(ADP-Ribose) Polimerase-1/antagonistas & inibidores , Inibidores de Poli(ADP-Ribose) Polimerases/química , Avaliação Pré-Clínica de Medicamentos , Humanos , Poli(ADP-Ribose) Polimerase-1/química , Relação Estrutura-Atividade
5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33951729

RESUMO

MOTIVATION: Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. RESULTS: In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.


Assuntos
Modelos Teóricos , Redes Neurais de Computação
6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33709154

RESUMO

BACKGROUND: Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS: In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION: PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Software , Testes de Carcinogenicidade/métodos , Carcinógenos , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Humanos
7.
Drug Discov Today ; 26(6): 1353-1358, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33581116

RESUMO

In 2010, the pan-assay interference compounds (PAINS) rule was proposed to identify false-positive compounds, especially frequent hitters (FHs), in biological screening campaigns, and has rapidly become an essential component in drug design. However, the specific mechanisms remain unknown, and the result validation and follow-up processing schemes are still unclear. In this review, a large benchmark collection of >600,000 compounds sourced from databases and the literature, including six common false-positive mechanisms, was used to evaluate the detection ability of PAINS. In addition, 400 million purchasable molecules from the ZINC database were also applied to PAINS screening. The results indicate that the PAINS rule is not suitable for the screening of all types of false-positive results and needs more improvement.


Assuntos
Bases de Dados Factuais , Desenho de Fármacos , Ensaios de Triagem em Larga Escala/métodos , Benchmarking , Descoberta de Drogas/métodos , Humanos
8.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33418563

RESUMO

Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA.


Assuntos
Técnicas de Química Sintética/métodos , Química Farmacêutica/métodos , Descoberta de Drogas/métodos , Drogas em Investigação/síntese química , Modelos Estatísticos , Biotransformação , Bases de Dados de Compostos Químicos , Conjuntos de Dados como Assunto , Descoberta de Drogas/estatística & dados numéricos , Drogas em Investigação/metabolismo , Humanos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Reprodutibilidade dos Testes
9.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32892221

RESUMO

BACKGROUND: High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of 'noisy compounds' in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS: In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram. CONCLUSION: Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy.


Assuntos
Bases de Dados de Produtos Farmacêuticos/estatística & dados numéricos , Desenho de Fármacos , Desenvolvimento de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Bibliotecas de Moléculas Pequenas , Produtos Biológicos/química , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Estabilidade de Medicamentos , Humanos , Estrutura Molecular , Preparações Farmacêuticas/química , Reprodutibilidade dos Testes , Projetos de Pesquisa
10.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33201188

RESUMO

BACKGROUND: Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS: In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection. CONCLUSION: ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.


Assuntos
Descoberta de Drogas , Corantes Fluorescentes/química , Aprendizado de Máquina , Modelos Químicos , Bibliotecas de Moléculas Pequenas , Fluorescência
11.
J Chem Inf Model ; 60(4): 2031-2043, 2020 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-32202787

RESUMO

Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, an in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive data set consisting of 20 888 FLuc inhibitors and 198 608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an area under the receiver operating characteristic curve (AUC) value of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors), set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set 3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845, and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detecting undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc inhibitors.


Assuntos
Bases de Dados de Compostos Químicos , Ensaios de Triagem em Larga Escala , Algoritmos , Luciferases , Reprodutibilidade dos Testes
12.
J Chem Inf Model ; 60(1): 63-76, 2020 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-31869226

RESUMO

Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure-property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Moreover, the group contribution model based on 110 atom types and the local models for different ionization states were also established and compared to the global models. The results demonstrated that the descriptor-based consensus model is superior to the group contribution method, and the local models have no advantage over the global models. Finally, matched molecular pair (MMP) analysis and descriptor importance analysis were performed to extract transformation rules and give some explanations related to log D7.4. In conclusion, we believe that the consensus model developed in this study can be used as a reliable and promising tool to evaluate log D7.4 in drug discovery.


Assuntos
Aprendizado de Máquina , Modelos Moleculares , Algoritmos , Descoberta de Drogas/métodos , Lipídeos/química , Relação Quantitativa Estrutura-Atividade
13.
J Chem Inf Model ; 59(9): 3714-3726, 2019 09 23.
Artigo em Inglês | MEDLINE | ID: mdl-31430151

RESUMO

Aggregation has been posing a great challenge in drug discovery. Current computational approaches aiming to filter out aggregated molecules based on their similarity to known aggregators, such as Aggregator Advisor, have low prediction accuracy, and therefore development of reliable in silico models to detect aggregators is highly desirable. In this study, we built a data set consisting of 12 119 aggregators and 24 172 drugs or drug candidates and then developed a group of classification models based on the combination of two ensemble learning approaches and five types of molecular representations. The best model yielded an accuracy of 0.950 and an area under the curve (AUC) value of 0.987 for the training set, and an accuracy of 0.937 and an AUC of 0.976 for the test set. The best model also gave reliable predictions to the external validation set with 5681 aggregators since 80% of molecules were predicted to be aggregators with a prediction probability higher than 0.9. More importantly, we explored the relationship between colloidal aggregation and molecular features, and generalized a set of simple rules to detect aggregators. Molecular features, such as log D, the number of hydroxyl groups, the number of aromatic carbons attached to a hydrogen atom, and the number of sulfur atoms in aromatic heterocycles, would be helpful to distinguish aggregators from nonaggregators. A comparison with numerous existing druglikeness and aggregation filtering rules and models used in virtual screening verified the high reliability of the model and rules proposed in this study. We also used the model to screen several curated chemical databases, and almost 20% of molecules in the evaluated databases were predicted as aggregators, highlighting the potential high risk of aggregation in screening. Finally, we developed an online Web server of ChemAGG ( http://admet.scbdd.com/ChemAGG/index ), which offers a freely available tool to detect aggregators.


Assuntos
Descoberta de Drogas/métodos , Preparações Farmacêuticas/química , Simulação por Computador , Bases de Dados de Produtos Farmacêuticos , Desenho de Fármacos , Humanos , Estrutura Molecular , Software , Relação Estrutura-Atividade
14.
Ying Yong Sheng Tai Xue Bao ; 29(6): 1893-1901, 2018 Jun.
Artigo em Chinês | MEDLINE | ID: mdl-29974699

RESUMO

Based on a grid (25 m X 25 m) equidistant sampling, the spatial variability of pH, organic matter, total nitrogen, available phosphorus, CEC and three typical heavy metal elements Cd, As and Pb in soil tillage layer (0-20 cm) were analyzed by using GIS and Geostatistics in the paddy field of 3.56 hm2 in Beishan Town, Changsha County, Hunan Province. The results showed that soil pH value and Pb content showed weak variation, and other indexes showed moderate variation. The order of variation was following available phosphorus > Cd > total nitrogen > organic matter > CEC > As > Pb > pH. Results of the semi-variance test showed that the best fitting model of the semi-variance function of organic matter, available phosphorus and As was exponential, and the best semi-variance function of pH, total nitrogen, CEC, Cd, Pb was spherical. All the indicators had a strong spatial correlation except for CEC, which showed moderate spatial correlation. Kriging interpolation analysis showed that pH, total nitrogen, CEC, Pb were plaque distribution, while organic matter, available phosphorus, Cd and As were block and banded distribution. Vegetation, topography and human activities were the main factors driving the variation of soil nutrients and heavy metals in the study area. The correlation between soil nutrients and heavy metals content was significant, among which pH and organic matter, Cd and Pb reached a very significant correlation level.


Assuntos
Monitoramento Ambiental , Sistemas de Informação Geográfica , Metais Pesados/análise , Oryza , China , Fósforo , Solo , Poluentes do Solo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...