RESUMO
Peptides have emerged as promising therapeutic agents. However, their potential is hindered by hemotoxicity. Understanding the hemotoxicity of peptides is crucial for developing safe and effective peptide-based therapeutics. Here, we employed chemical space complex networks (CSNs) to unravel the hemotoxicity tapestry of peptides. CSNs are powerful tools for visualizing and analyzing the relationships between peptides based on their physicochemical properties and structural features. We constructed CSNs from the StarPepDB database, encompassing 2004 hemolytic peptides, and explored the impact of seven different (dis)similarity measures on network topology and cluster (communities) distribution. Our findings revealed that each CSN extracts orthogonal information, enhancing the motif discovery and enrichment process. We identified 12 consensus hemolytic motifs, whose amino acid composition unveiled a high abundance of lysine, leucine, and valine residues, while aspartic acid, methionine, histidine, asparagine and glutamine were depleted. Additionally, physicochemical properties were used to characterize clusters/communities of hemolytic peptides. To predict hemolytic activity directly from peptide sequences, we constructed multi-query similarity searching models (MQSSMs), which outperformed cutting-edge machine learning (ML)-based models, demonstrating robust hemotoxicity prediction capabilities. Overall, this novel in silico approach uses complex network science as its central strategy to develop robust model classifiers, to characterize the chemical space and to discover new motifs from hemolytic peptides. This will help to enhance the design/selection of peptides with potential therapeutic activity and low toxicity.
RESUMO
Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product-likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product-likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small-molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.
Assuntos
Produtos Biológicos , Produtos Biológicos/química , Produtos Biológicos/farmacologia , América Latina , Bibliotecas de Moléculas Pequenas/farmacologia , Bibliotecas de Moléculas Pequenas/química , Descoberta de Drogas , Quimioinformática , Bases de Dados de Compostos QuímicosRESUMO
This work examines the current landscape of drug discovery and development, with a particular focus on the chemical and pharmacological spaces. It emphasizes the importance of understanding these spaces to anticipate future trends in drug discovery. The use of cheminformatics and data analysis enabled in silico exploration of these spaces, allowing a perspective of drugs, approved drugs after 2020, and clinical candidates, which were extracted from the newly released ChEMBL34 (March 2024). This perspective on chemical and pharmacological spaces enables the identification of trends and areas to be occupied, thereby creating opportunities for more effective and targeted drug discovery and development strategies in the future.
RESUMO
This work aimed to discover protein tyrosine phosphatase 1B (PTP1B) inhibitors from a small molecule library of natural products (NPs) derived from selected Mexican medicinal plants and fungi to find new hits for developing antidiabetic drugs. The products showing similar IC50 values to ursolic acid (UA) (positive control, IC50 = 26.5) were considered hits. These compounds were canophyllol (1), 5-O-(ß-D-glucopyranosyl)-7-methoxy-3',4'-dihydroxy-4-phenylcoumarin (2), 3,4-dimethoxy-2,5-phenanthrenediol (3), masticadienonic acid (4), 4',5,6-trihydroxy-3',7-dimethoxyflavone (5), E/Z vermelhotin (6), tajixanthone hydrate (7), quercetin-3-O-(6â³-benzoyl)-ß-D-galactoside (8), lichexanthone (9), melianodiol (10), and confusarin (11). According to the double-reciprocal plots, 1 was a non-competitive inhibitor, 3 a mixed-type, and 6 competitive. The chemical space analysis of the hits (IC50 < 100 µM) and compounds possessing activity (IC50 in the range of 100-1,000 µM) with the BIOFACQUIM library indicated that the active molecules are chemically diverse, covering most of the known Mexican NPs' chemical space. Finally, a structure-activity similarity (SAS) map was built using the Tanimoto similarity index and PTP1B absolute inhibitory activity, which allows the identification of seven scaffold hops, namely, compounds 3, 5, 6, 7, 8, 9, and 11. Canophyllol (1), on the other hand, is a true analog of UA since it is an SAR continuous zone of the SAS map.
RESUMO
Science and art have been connected for centuries. With the development of new computational methods, new scientific disciplines have emerged, such as computational chemistry, and related fields, such as cheminformatics. Chemoinformatics is grounded on the chemical space concept: a multi-descriptor space in which chemical structures are described. In several practical applications, visual representations of the chemical space of compound datasets are low-dimensional plots helpful in identifying patterns. However, the authors propose that the plots can also be used as artistic expressions. This manuscript introduces an approach to merging art with chemoinformatics through visual and artistic representations of chemical space. As case studies, we portray the chemical space of food chemicals and other compounds to generate visually appealing graphs with twofold benefits: sharing chemical knowledge and developing pieces of art driven by chemoinformatics. The art driven by chemical space visualization will help increase the application of chemistry and art and contribute to general education and dissemination of chemoinformatics and chemistry through artistic expressions. All the code and data sets to reproduce the visual representation of the chemical space presented in the manuscript are freely available at https://github.com/DIFACQUIM/Art-Driven-by-Visual-Representations-of-Chemical-Space- . Scientific contribution: Chemical space as a concept to create digital art and as a tool to train and introduce students to cheminformatics.
RESUMO
The number of databases of natural products (NPs) has increased substantially. Latin America is extraordinarily rich in biodiversity, enabling the identification of novel NPs, which has encouraged both the development of databases and the implementation of those that are being created or are under development. In a collective effort from several Latin American countries, herein we introduce the first version of the Latin American Natural Products Database (LANaPDB), a public compound collection that gathers the chemical information of NPs contained in diverse databases from this geographical region. The current version of LANaPDB unifies the information from six countries and contains 12,959 chemical structures. The structural classification showed that the most abundant compounds are the terpenoids (63.2%), phenylpropanoids (18%) and alkaloids (11.8%). From the analysis of the distribution of properties of pharmaceutical interest, it was observed that many LANaPDB compounds satisfy some drug-like rules of thumb for physicochemical properties. The concept of the chemical multiverse was employed to generate multiple chemical spaces from two different fingerprints and two dimensionality reduction techniques. Comparing LANaPDB with FDA-approved drugs and the major open-access repository of NPs, COCONUT, it was concluded that the chemical space covered by LANaPDB completely overlaps with COCONUT and, in some regions, with FDA-approved drugs. LANaPDB will be updated, adding more compounds from each database, plus the addition of databases from other Latin American countries.
RESUMO
This study examines how two popular drug-likeness concepts used in early development, Lipinski Rule of Five (Ro5) and Veber's Rules, possibly affected drug profiles of FDA approved drugs since 1997. Our findings suggest that when all criteria are applied, relevant compounds may be excluded, addressing the harmfulness of blindly employing these rules. Of all oral drugs in the period used for this analysis, around 66 % conform to the RO5 and 85 % to Veber's Rules. Molecular Weight and calculated LogP showed low consistent values over time, apart from being the two least followed rules, challenging their relevance. On the other hand, hydrogen bond related rules and the number of rotatable bonds are amongst the most followed criteria and show exceptional consistency over time. Furthermore, our analysis indicates that topological polar surface area and total count of hydrogen bonds cannot be used as interchangeable parameters, contrary to the original proposal. This research enhances the comprehension of drug profiles that were FDA approved in the post-Lipinski period. Medicinal chemists could utilize these heuristics as a limited guide to direct their exploration of the oral bioavailability chemical space, but they must also steer the wheel to break these rules and explore different regions when necessary.
Assuntos
Aprovação de Drogas , Disponibilidade Biológica , Ligação de Hidrogênio , Peso MolecularRESUMO
BACKGROUND: The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE: Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHOD: We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS: The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION: The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
RESUMO
Drug-induced liver injury (DILI) is the principal reason for failure in developing drug candidates. It is the most common reason to withdraw from the market after a drug has been approved for clinical use. In this context, data from animal models, liver function tests, and chemical properties could complement each other to understand DILI events better and prevent them. Since the chemical space concept improves decision-making drug design related to the prediction of structure-property relationships, side effects, and polypharmacology drug activity (uniquely mentioning the most recent advances), it is an attractive approach to combining different phenomena influencing DILI events (e.g., individual "chemical spaces") and exploring all events simultaneously in an integrated analysis of the DILI-relevant chemical space. However, currently, no systematic methods allow the fusion of a collection of different chemical spaces to collect different types of data on a unique chemical space representation, namely "consensus chemical space." This study is the first report that implements data fusion to consider different criteria simultaneously to facilitate the analysis of DILI-related events. In particular, the study highlights the importance of analyzing together in vitro and chemical data (e.g., topology, bond order, atom types, presence of rings, ring sizes, and aromaticity of compounds encoded on RDKit fingerprints). These properties could be aimed at improving the understanding of DILI events.
Assuntos
Doença Hepática Induzida por Substâncias e Drogas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Animais , Consenso , Modelos Animais , Fenômenos QuímicosRESUMO
Chagas disease is a neglected tropical disease caused by the protozoa Trypanosoma cruzi. Cruzain, its main cysteine protease, is commonly targeted in drug discovery efforts to find new treatments for this disease. Even though the essentiality of this enzyme for the parasite has been established, many cruzain inhibitors fail as trypanocidal agents. This lack of translation from biochemical to biological assays can involve several factors, including suboptimal physicochemical properties. In this work, we aim to rationalize this phenomenon through chemical space analyses of calculated molecular descriptors. These include statistical tests, visualization of projections, scaffold analysis, and creation of machine learning models coupled with interpretability methods. Our results demonstrate a significant difference between the chemical spaces of cruzain and T.â cruzi inhibitors, with compounds with more hydrogen bond donors and rotatable bonds being more likely to be good cruzain inhibitors, but less likely to be active on T.â cruzi. In addition, cruzain inhibitors seem to occupy specific regions of the chemical space that cannot be easily correlated with T.â cruzi activity, which means that using predictive modeling to determine whether cruzain inhibitors will be trypanocidal is not a straightforward task. We believe that the conclusions from this work might be of interest for future projects that aim to develop novel trypanocidal compounds.
Assuntos
Doença de Chagas , Tripanossomicidas , Trypanosoma cruzi , Humanos , Cisteína Endopeptidases/química , Doença de Chagas/tratamento farmacológico , Proteínas de Protozoários , Tripanossomicidas/química , Inibidores de Cisteína Proteinase/farmacologia , Inibidores de Cisteína Proteinase/químicaRESUMO
Natural products (NPs) are a rich source of structurally novel molecules, and the chemical space they encompass is far from being fully explored. Over history, NPs have represented a significant source of bioactive molecules and have served as a source of inspiration for developing many drugs on the market. On the other hand, computer-aided drug design (CADD) has contributed to drug discovery research, mitigating costs and time. In this sense, compound databases represent a fundamental element of CADD. This work reviews the progress toward developing compound databases of natural origin, and it surveys computational methods, emphasizing chemoinformatic approaches to profile natural product databases. Furthermore, it reviews the present state of the art in developing Latin American NP databases and their practical applications to the drug discovery area.
Assuntos
Produtos Biológicos , Produtos Biológicos/química , Bases de Dados Factuais , Desenho de Fármacos , Descoberta de Drogas , América LatinaRESUMO
Technological advances and practical applications of the chemical space concept in drug discovery, natural product research, and other research areas have attracted the scientific community's attention. The large- and ultra-large chemical spaces are associated with the significant increase in the number of compounds that can potentially be made and exist and the increasing number of experimental and calculated descriptors, that are emerging that encode the molecular structure and/or property aspects of the molecules. Due to the importance and continued evolution of compound libraries, herein, we discuss definitions proposed in the literature for chemical space and emphasize the convenience, discussed in the literature to use complementary descriptors to obtain a comprehensive view of the chemical space of compound data sets. In this regard, we introduce the term chemical multiverse to refer to the comprehensive analysis of compound data sets through several chemical spaces, each defined by a different set of chemical representations. The chemical multiverse is contrasted with a related idea: consensus chemical space.
Assuntos
Produtos Biológicos , Bibliotecas de Moléculas Pequenas , Bibliotecas de Moléculas Pequenas/química , Relação Estrutura-Atividade , Estrutura Molecular , Descoberta de DrogasRESUMO
INTRODUCTION: Chemical space is a general conceptual framework that addresses the diversity of molecules and it has various applications. Moreover, chemical space is a cornerstone of chemoinformatics. In response to the increase in the set of chemical compounds in databases, generators of chemical structures, and tools to calculate molecular descriptors, novel approaches to generate visual representations of chemical space are emerging and evolving. AREAS COVERED: The current state of chemical space in drug design and discovery is reviewed. The topics discussed herein include advances for efficient navigation in chemical space, the use of this concept in assessing the diversity of different data sets, exploring structure-property/activity relationships for one or multiple endpoints, and compound library design. Recent advances in methodologies for generating visual representations of chemical space have been highlighted, thereby emphasizing open-source methods. EXPERT OPINION: Quantitative and qualitative generation and analysis of chemical space require novel approaches for handling the increasing number of molecules and their information available in chemical databases (including emerging ultra-large libraries). Chemical space is a conceptual framework that goes beyond visual representation in low dimensions. However, the graphical representation of chemical space has several practical applications in drug discovery and beyond.
Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Bases de Dados Factuais , Desenho de Fármacos , Descoberta de Drogas/métodos , Humanos , Relação Estrutura-AtividadeRESUMO
The purpose of this work is the creation of a chemical database named ChemTastesDB that includes both organic and inorganic tastants. The creation, curation pipeline and the main features of the database are described in detail. The database includes 2944 verified and curated compounds divided into nine classes, which comprise the five basic tastes (sweet, bitter, umami sour and salty) along with four additional categories: tasteless, non-sweet, multitaste and miscellaneous. ChemTastesDB provides the following information for each tastant: name, PubChem CID, CAS registry number, canonical SMILES, class taste and references to the scientific sources from which data were retrieved. The molecular structure in the HyperChem (.hin) format of each chemical is also made available. In addition, molecular fingerprints were used for characterizing and analyzing the chemical space of tastants by means of unsupervised machine learning. ChemTastesDB constitutes a useful tool to the scientific community to expand the information of taste molecules and to assist in silico studies for the taste prediction of unevaluated and as yet unsynthetized compounds, as well as the analysis of the relationships between molecular structure and taste. The database is freely accessible at https://doi.org/10.5281/zenodo.5747393.
RESUMO
Peptide-based drugs are promising anticancer candidates due to their biocompatibility and low toxicity. In particular, tumor-homing peptides (THPs) have the ability to bind specifically to cancer cell receptors and tumor vasculature. Despite their potential to develop antitumor drugs, there are few available prediction tools to assist the discovery of new THPs. Two webservers based on machine learning models are currently active, the TumorHPD and the THPep, and more recently the SCMTHP. Herein, a novel method based on network science and similarity searching implemented in the starPep toolbox is presented for THP discovery. The approach leverages from exploring the structural space of THPs with Chemical Space Networks (CSNs) and from applying centrality measures to identify the most relevant and non-redundant THP sequences within the CSN. Such THPs were considered as queries (Qs) for multi-query similarity searches that apply a group fusion (MAX-SIM rule) model. The resulting multi-query similarity searching models (SSMs) were validated with three benchmarking datasets of THPs/non-THPs. The predictions achieved accuracies that ranged from 92.64 to 99.18% and Matthews Correlation Coefficients between 0.894-0.98, outperforming state-of-the-art predictors. The best model was applied to repurpose AMPs from the starPep database as THPs, which were subsequently optimized for the TH activity. Finally, 54 promising THP leads were discovered, and their sequences were analyzed to encounter novel motifs. These results demonstrate the potential of CSNs and multi-query similarity searching for the rapid and accurate identification of THPs.
RESUMO
γ-Secretase (GS) is one of the most attractive molecular targets for the treatment of Alzheimer's disease (AD). Its key role in the final step of amyloid-ß peptides generation and its relationship in the cascade of events for disease development have caught the attention of many pharmaceutical groups. Over the past years, different inhibitors and modulators have been evaluated as promising therapeutics against AD. However, despite the great chemical diversity of the reported compounds, a global classification and visual representation of the chemical space for GS inhibitors and modulators remain unavailable. In the present work, we carried out a two-dimensional (2D) chemical space analysis from different classes and subclasses of GS inhibitors and modulators based on their structural similarity. Along with the novel structural information available for GS complexes, our analysis opens the possibility to identify compounds with high molecular similarity, critical to finding new chemical structures through the optimization of existing compounds and relating them with a potential binding site.
Assuntos
Doença de Alzheimer , Secretases da Proteína Precursora do Amiloide , Doença de Alzheimer/tratamento farmacológico , Secretases da Proteína Precursora do Amiloide/metabolismo , Peptídeos beta-Amiloides , Sítios de Ligação , Inibidores Enzimáticos/farmacologia , HumanosRESUMO
Inhibiting the tubulin-microtubules (Tub-Mts) system is a classic and rational approach for treating different types of cancers. A large amount of data on inhibitors in the clinic supports Tub-Mts as a validated target. However, most of the inhibitors reported thus far have been developed around common chemical scaffolds covering a narrow region of the chemical space with limited innovation. This manuscript aims to discuss the first activity landscape and scaffold content analysis of an assembled and curated cell-based database of 851 Tub-Mts inhibitors with reported activity against five cancer cell lines and the Tub-Mts system. The structure-bioactivity relationships of the Tub-Mts system inhibitors were further explored using constellations plots. This recently developed methodology enables the rapid but quantitative assessment of analog series enriched with active compounds. The constellations plots identified promising analog series with high average biological activity that could be the starting points of new and more potent Tub-Mts inhibitors.
Assuntos
Quimioinformática , Neoplasias/tratamento farmacológico , Moduladores de Tubulina/química , Tubulina (Proteína)/química , Linhagem Celular Tumoral , Humanos , Neoplasias/genética , Tubulina (Proteína)/efeitos dos fármacos , Tubulina (Proteína)/genética , Moduladores de Tubulina/farmacologiaRESUMO
Bacterial genome sequencing has revealed a vast number of novel biosynthetic gene clusters (BGC) with potential to produce bioactive natural products. However, the biosynthesis of secondary metabolites by bacteria is often silenced under laboratory conditions, limiting the controlled expression of natural products. Here we describe an integrated methodology for the construction and screening of an elicited and pre-fractionated library of marine bacteria. In this pilot study, chemical elicitors were evaluated to mimic the natural environment and to induce the expression of cryptic BGCs in deep-sea bacteria. By integrating high-resolution untargeted metabolomics with cheminformatics analyses, it was possible to visualize, mine, identify and map the chemical and biological space of the elicited bacterial metabolites. The results show that elicited bacterial metabolites correspond to ~45% of the compounds produced under laboratory conditions. In addition, the elicited chemical space is novel (~70% of the elicited compounds) or concentrated in the chemical space of drugs. Fractionation of the crude extracts further evidenced minor compounds (~90% of the collection) and the detection of biological activity. This pilot work pinpoints strategies for constructing and evaluating chemically diverse bacterial natural product libraries towards the identification of novel bacterial metabolites in natural product-based drug discovery pipelines.
RESUMO
Natural products have a significant role in drug discovery. Natural products have distinctive chemical structures that have contributed to identifying and developing drugs for different therapeutic areas. Moreover, natural products are significant sources of inspiration or starting points to develop new therapeutic agents. Natural products such as peptides and macrocycles, and other compounds with unique features represent attractive sources to address complex diseases. Computational approaches that use chemoinformatics and molecular modeling methods contribute to speed up natural product-based drug discovery. Several research groups have recently used computational methodologies to organize data, interpret results, generate and test hypotheses, filter large chemical databases before the experimental screening, and design experiments. This review discusses a broad range of chemoinformatics applications to support natural product-based drug discovery. We emphasize profiling natural product data sets in terms of diversity; complexity; acid/base; absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties; and fragment analysis. Novel techniques for the visual representation of the chemical space are also discussed.
Assuntos
Produtos Biológicos/química , Produtos Biológicos/farmacologia , Quimioinformática/métodos , Bases de Dados Factuais , Descoberta de Drogas/métodos , Animais , Produtos Biológicos/análise , Humanos , Modelos MolecularesRESUMO
Around the world, the number of compound databases of natural products in the public domain is rising. This is in line with the increasing synergistic combination of natural product research and chemoinformatics. Toward this global endeavor, countries in Latin America are assembling, curating, and analyzing the contents and diversity of natural products available in their geographical regions. In this manuscript we collect and analyze the efforts that countries in Latin America have made so far to build natural product databases. We further encourage the scientific community in particular in Latin America, to continue their efforts to building quality natural product databases and, whenever possible, to make them publicly accessible. It is proposed that all compound collections could be assembled into a unified resource called LANaPD: Latin American Natural Products Database. Opportunities and challenges to build, distribute and maintain LANaPD are also discussed.