ABSTRACT
Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.
Subject(s)
Enzyme Inhibitors , Machine Learning , Urease , Urease/antagonists & inhibitors , Urease/chemistry , Urease/metabolism , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/pharmacology , Helicobacter pylori/enzymology , Helicobacter pylori/drug effects , Algorithms , HumansABSTRACT
Antimicrobial peptides (AMPs) are promising cationic and amphipathic molecules to fight antibiotic resistance. To search for novel AMPs, we applied a computational strategy to identify peptide sequences within the organisms' proteome, including in-house developed software and artificial intelligence tools. After analyzing 150.450 proteins from eight proteomes of bacteria, plants, a protist, and a nematode, nine peptides were selected and modified to increase their antimicrobial potential. The 18 resulting peptides were validated by bioassays with four pathogenic bacterial species, one yeast species, and two cancer cell-lines. Fourteen of the 18 tested peptides were antimicrobial, with minimum inhibitory concentrations (MICs) values under 10 µM against at least three bacterial species; seven were active against Candida albicans with MICs values under 10 µM; six had a therapeutic index above 20; two peptides were active against A549 cells, and eight were active against MCF-7 cells under 30 µM. This study's most active antimicrobial peptides damage the bacterial cell membrane, including grooves, dents, membrane wrinkling, cell destruction, and leakage of cytoplasmic material. The results confirm that the proposed approach, which uses bioinformatic tools and rational modifications, is highly efficient and allows the discovery, with high accuracy, of potent AMPs encrypted in proteins.
Subject(s)
Anti-Infective Agents , Proteome , Antimicrobial Cationic Peptides/pharmacology , Antimicrobial Cationic Peptides/chemistry , Antimicrobial Peptides , Artificial Intelligence , Anti-Infective Agents/pharmacology , Anti-Infective Agents/chemistry , Bacteria , Microbial Sensitivity Tests , Anti-Bacterial Agents/pharmacologyABSTRACT
Molecules sourced from marine environments hold immense promise for the development of novel therapeutic drugs, owing to their distinctive chemical compositions and valuable medicinal attributes. Notably, Talarolide A and Talaropeptides A-D have gained recent attention as potential candidates for pharmaceutical applications. This study aims to explore the chemical reactivity of Talarolide A and Talaropeptides A-D through the application of molecular modeling and computational chemistry techniques, specifically employing Conceptual Density Functional Theory (CDFT). By investigating their chemical behaviors, the study seeks to contribute to the understanding of the potential pharmacological uses of these marine-derived compounds. The molecular geometry optimizations and frequency calculations were conducted using the Density Functional Tight Binding (DFTBA) method. This was followed by a subsequent round of geometry optimization, frequency analysis, and computation of electronic properties and chemical reactivity descriptors. We employed the MN12SX/Def2TZVP/H2O model chemistry, utilizing the Gaussian 16 program and the SMD solvation model. The analysis of the global reactivity descriptors arising from CDFT was achieved as well as the graphical comparison of the dual descriptor DD revealing the areas of the molecules with more propensity to suffer a nucleophilic or electrophilic attack. Additionally, Molinspiration and SwissTargetPrediction were considered for the calculation of molecular characteristics and predicted biological targets. These include enzymes, nuclear receptors, kinase inhibitors, GPCR ligands, and ion channel modulators. The graphical results show that Talarolide A and the Talaropeptides A-D are likely to behave as protease inhibitors.
Subject(s)
Receptors, G-Protein-Coupled , Signal Transduction , Density Functional Theory , Ligands , Peptides/pharmacologyABSTRACT
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .
ABSTRACT
Salirasib, or farnesylthiosalicylic acid (FTS), is a salicylic acid derivative with demonstrated antineoplastic activity. While designed as a competitor of the substrate S-farnesyl cysteine on Ras, it is a potent competitive inhibitor of isoprenylcysteine carboxymethyl transferase. In this study, the antiproliferative activity on six different solid tumor cell lines was evaluated with a series of lipophilic thioether modified salirasib analogues, including those with or without a 1,2,3-triazole linker. A combination of bioassay, cheminformatics, docking, and in silico ADME-Tox was also performed. SAR analysis that analogues with three or more isoprene units or a long aliphatic chain exhibited the most potent activity. Furthermore, three compounds display superior antiproliferative activity than salirasib and similar potency compared to control anticancer drugs across all tested solid tumor cell lines. In addition, the behavior of the collection on migration and invasion, a key process in tumor metastasis, was also studied. Three analogues with specific antimigratory activity were identified with differential structural features being interesting starting points on the development of new antimetastatic agents. The antiproliferative and antimigratory effects observed suggest that modifying the thiol aliphatic/prenyl substituents can modulate the activity.
Subject(s)
Antineoplastic Agents , Antineoplastic Agents/pharmacology , Salicylates/pharmacology , Farnesol/pharmacology , Cell Line, Tumor , Cell ProliferationABSTRACT
The importance of epigenetic drug and probe discovery is on the rise. This is not only paramount to identify and develop therapeutic treatments associated with epigenetic processes but also to understand the underlying epigenetic mechanisms involved in biological processes. To this end, chemical vendors have been developing synthetic compound libraries focused on epigenetic targets to increase the probabilities of identifying promising starting points for drug or probe candidates. However, the chemical contents of these data sets, the distribution of their physicochemical properties, and diversity remain unknown. To fill this gap and make this information available to the scientific community, we report a comprehensive analysis of eleven libraries focused on epigenetic targets containing more than 50,000 compounds. We used well-validated chemoinformatics approaches to characterize these sets, including novel methods such as automated detection of analog series and visual representations of the chemical space based on Constellation Plots and Chemical Library Networks. This work will guide the efforts of experimental groups working on high-throughput and medium-throughput screening of epigenetic-focused libraries. The outcome of this work can also be used as a reference to design and describe novel focused epigenetic libraries.
Subject(s)
Cheminformatics , Small Molecule Libraries , Epigenesis, Genetic , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacologyABSTRACT
With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for Quantitative Structure-Activity Relationship (QSAR) modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over $25 000$ trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. Although supervised embeddings yielded competitive results compared with those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks and motivate a discussion about the potential of molecular embeddings in computer-aided drug design.
Subject(s)
Algorithms , Quantitative Structure-Activity RelationshipABSTRACT
Homophymines A-E and A1-E1 are bioactive natural cyclodepsipeptides with a complex molecular architecture. These molecules could have a potential use as antimicrobial, antiviral, and anticancer substances. We have carried out a computational study of the properties of this family of marine peptides using a CDFT-based Computational Peptidology (CDFT-CP) methodology that results from the combination of the chemical reactivity descriptors that arise from conceptual Density Functional Theory (CDFT) together with cheminformatics tools. The latter can be used to estimate the associated physicochemical parameters and to improve the process of virtual screening through a similarity search. Using this approach, the ability of the peptides to behave as a potentially useful drugs can be investigated. An analysis of their bioactivity and pharmacokinetics indices related to the ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) features has also been carried out.
Subject(s)
Depsipeptides , Peptides, Cyclic , Chemical Phenomena , Cheminformatics , Density Functional TheoryABSTRACT
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.
ABSTRACT
Organic chemicals identified in raw landfill leachate (LL) and their transformation products (TPs), formed during Fenton treatment, were analyzed for chemical safety following REACH guidelines. The raw LL was located in the metropolitan region of Campina Grande, in northeast Brazil. We elucidated 197 unique chemical structures, including 154 compounds that were present in raw LL and 82 compounds that were detected in the treated LL, totaling 39 persistent compounds and 43 TPs. In silico models were developed to identify and prioritize the potential level of hazard/risk these compounds pose to the environment and society. The models revealed that the Fenton process improved the biodegradability of TPs. Still, a slight increase in ecotoxicological effects was observed among the compounds in treated LL compared with those present in raw LL. No differences were observed for aryl hydrocarbon receptor (AhR) and antioxidant response element (ARE) mutagenicity. Similar behavior among both raw and treated LL samples was observed for biodegradability; Tetrahymena pyriformis, Daphnia magna, Pimephales promelas and ARE, AhR, and Ames mutagenicity. Overall, our results suggest that raw and treated LL samples have similar activity profiles for all endpoints other than biodegradability.
Subject(s)
Chemical Safety , Water Pollutants, Chemical , Hydrogen Peroxide , Organic Chemicals , Oxidation-Reduction , Water Pollutants, Chemical/analysis , Water Pollutants, Chemical/toxicityABSTRACT
Natural products are continually explored in the development of new bioactive compounds with industrial applications, attracting the attention of scientific research efforts due to their pharmacophore-like structures, pharmacokinetic properties, and unique chemical space. The systematic search for natural sources to obtain valuable molecules to develop products with commercial value and industrial purposes remains the most challenging task in bioprospecting. Virtual screening strategies have innovated the discovery of novel bioactive molecules assessing in silico large compound libraries, favoring the analysis of their chemical space, pharmacodynamics, and their pharmacokinetic properties, thus leading to the reduction of financial efforts, infrastructure, and time involved in the process of discovering new chemical entities. Herein, we discuss the computational approaches and methods developed to explore the chemo-structural diversity of natural products, focusing on the main paradigms involved in the discovery and screening of bioactive compounds from natural sources, placing particular emphasis on artificial intelligence, cheminformatics methods, and big data analyses.
ABSTRACT
Peptide research has increased during the last years due to their applications as biomarkers, therapeutic alternatives or as antigenic sub-units in vaccines. The implementation of computational resources have facilitated the identification of novel sequences, the prediction of properties, and the modelling of structures. However, there is still a lack of open source protocols that enable their straightforward analysis. Here, we present PepFun, a compilation of bioinformatics and cheminformatics functionalities that are easy to implement and customize for studying peptides at different levels: sequence, structure and their interactions with proteins. PepFun enables calculating multiple characteristics for massive sets of peptide sequences, and obtaining different structural observables derived from protein-peptide complexes. In addition, random or guided library design of peptide sequences can be customized for screening campaigns. The package has been created under the python language based on built-in functions and methods available in the open source projects BioPython and RDKit. We present two tutorials where we tested peptide binders of the MHC class II and the Granzyme B protease.
Subject(s)
Cheminformatics/methods , Computational Biology/methods , Peptides/metabolism , Genes, MHC Class II/genetics , Granzymes/metabolism , Proteins/metabolismABSTRACT
Isoxazoline is a 5-membered heterocycle present in the active compounds of many commercial veterinary anti-ectoparasitic products. The molecular target of isoxazolines is the inhibition of GABA-gated chloride channels in insects. These facts have inspired the use of the isoxazoline scaffold in the design of novel insecticide compounds. The main strategies used for isoxazoline synthesis are either the 1,3-dipolar cycloaddition between a nitrile oxide and an alkene or the reaction between hydroxylamine and an α,ß-unsaturated carbonyl compound. This review highlights the utilization of isoxazoline as insecticide: its mode of action, its commercial preparations and its consideration in the design of novel insecticides. Similarity analyses were performed with 235 isoxazoline derivatives in three different cheminformatic approaches - chemical property correlations, similarity network and compound clustering. The cheminformatic methodologies are interesting tools to use in evaluating the similarity between commercial isoxazolines and to clarify the main features explored within their derivatives.
Subject(s)
Drug Development , Insecta/drug effects , Insecticides/pharmacology , Isoxazoles/pharmacology , Animals , Insecticides/chemical synthesis , Insecticides/chemistry , Isoxazoles/chemical synthesis , Isoxazoles/chemistry , Molecular Structure , Receptors, GABA/metabolismABSTRACT
Introduction: The timely identification biologically active chemicals, in disease relevant screening assays, is a major endeavor in drug discovery. The existence of frequent hitters (FHs) in non-related assays poses a formidable challenge in terms of whether to consider these molecules as chemical gold or promiscuous non-selective reactive trash (also known as PAINS - pan assay interference compounds).Areas covered: In this review, the authors bring together expertize in synthetic chemistry, cheminformatics and biochemistry, three key areas for dealing with FHs. They discuss synthetic methods facilitating preparation of chemically diverse molecular libraries, while favoring activity in the biological space. They also survey and discuss recent computational advances in the prediction of PAINS from chemical structures. Finally, they review experimental approaches for the validation of the biological activity of screening hits and discuss alternatives for exploiting promiscuity and chemical reactivity.Expert opinion: It's essential to develop more efficient computational methods to reliably recognize PAINS in distinct molecular environments. Accordingly, advances in synthetic chemistry hold the promise to provide a better quality of chemical matter for drug discovery. Medicinal chemists should be more open to screening for hits showing biologically complex mechanisms of action rather than discarding molecules that may prove valuable as innovative disease treatments.
Subject(s)
Chemistry Techniques, Synthetic/methods , Drug Discovery/methods , Small Molecule Libraries , Animals , Cheminformatics , HumansABSTRACT
Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and labor-saving. Among the VS approaches, quantitative structure-activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to nD, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this mini-review, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.
ABSTRACT
Peptide and peptide-like structures are regaining attention in drug discovery. Previous studies suggest that bioactive peptides have diverse structures and may have physicochemical properties attractive to become hit and lead compounds. However, chemoinformatic studies that characterize such diversity are limited. Herein, we report the physicochemical property profile and chemical space of four synthetic linear and cyclic combinatorial peptide libraries. As a case study, the analysis was focused on penta-peptides. The chemical space of the peptide and N-methylated peptides libraries was compared to compound data sets of pharmaceutical relevance. Results indicated that there is a major overlap in the chemical space of N-methylated cyclic peptides with inhibitors of protein-protein interactions and macrocyclic natural products available for screening. Also, there is an overlap between the chemical space of the synthetic peptides with peptides approved for clinical use (or in clinical trials), and to other approved drugs that are outside the traditional chemical space. Results further support that synthetic penta-peptides are suitable compounds to be used in drug discovery projects.
Subject(s)
Drug Discovery , Peptides, Cyclic/chemistry , Chemical Phenomena , Peptides, Cyclic/pharmacologyABSTRACT
The development of new chemical entities against the major diseases caused by parasites is highly desired. A library of thirty diamines analogs following a minimalist approach and supported by chemoinformatics tools have been prepared and evaluated against apicomplexan parasites. Different member of the series of N,N'-disubstituted aliphatic diamines shown in vitro activities at submicromolar concentrations and high levels of selectivity against Toxoplasma gondii and in chloroquine-sensitive and resistant-strains of Plasmodium falciparum. In order to demonstrate the importance of the secondary amines, ten N,N,N',N'-tetrasubstituted aliphatic diamines derivatives were synthesized being considerably less active than their disubstituted counterpart. Theoretical studies were performed to establish the electronic factors that govern the activity of the compounds.
Subject(s)
Antiparasitic Agents/pharmacology , Apicomplexa/drug effects , Polyamines/pharmacology , Antiparasitic Agents/chemical synthesis , Antiparasitic Agents/chemistry , Dose-Response Relationship, Drug , Molecular Structure , Parasitic Sensitivity Tests , Plasmodium falciparum/drug effects , Polyamines/chemical synthesis , Polyamines/chemistry , Structure-Activity Relationship , Toxoplasma/drug effectsABSTRACT
In this work, we discuss the characterization and diversity analysis of 354 natural products (NPs) from Panama, systematically analyzed for the first time. The in-house database was compared to NPs from Brazil, compounds from Traditional Chinese Medicine, natural and semisynthetic collections used in high-throughput screening, and compounds from ChEMBL. An analysis of the "global diversity" was conducted using molecular properties of pharmaceutical interest, three molecular fingerprints of different design, molecular scaffolds, and molecular complexity. The global diversity was visualized using consensus diversity plots that revealed that the secondary metabolites in the Panamanian flora have a large scaffold diversity as compared to other composite databases and also have several unique scaffolds. The large scaffold diversity is in agreement with the broad range of biological activities that this collection of NPs from Panama has shown. This study also provided further quantitative evidence of the large structural complexity of NPs. The results obtained in this study support that NPs from Panama are promising candidates to identify selective molecules and are suitable sources of compounds for virtual screening campaigns.
Subject(s)
Biological Products/chemistry , Drug Discovery , Informatics , Biodiversity , Panama , Plants/chemistry , Plants/classificationABSTRACT
Many drug discovery projects rely on commercial compounds to discover active leads. However, current commercial libraries, with mostly synthetic compounds, access a small fraction of the possible chemical diversity. Natural products, in contrast, possess a vast structural diversity and have proven to be an outstanding source of new drugs. Several chemoinformatic analyses of natural products have demonstrated their diversity and structural complexity. However, to our knowledge, the scaffold content and structural diversity of fungal secondary metabolites have never been studied. Herein, the scaffold diversity of 223 fungal metabolites was measured and compared to the diversity of approved drugs and commercial libraries for HTS containing natural, synthetic, and semi-synthetic compounds. In addition, the global diversity of the fungal isolates was assessed and compared to other reference data sets using Consensus Diversity Plots, a chemoinformatic tool recently developed. It was concluded that fungal secondary metabolites are cyclic systems with few ramifications and more diverse than the commercial libraries with natural products and semi-synthetic compounds. The fungal metabolites data set was one of the most structurally diverse, containing a large proportion of different and unique scaffolds not found in the other compound data sets including ChEMBL. Therefore, fungal metabolites offer a rich source of molecules suited for identifying diverse candidates for drug discovery.
ABSTRACT
Prostate cancer is one of the most prevalent types of cancer in male population. It is a hormone driven disease, especially in its initial phase. Hence, androgen deprivation therapy (ADT) is the major chemotherapeutic effort and novel AR inhibitors with improved pharmacological profiles are needed. In this report, a novel bioactive compound was selected and investigated using in silico and cell-based assays. Neq0502 compound was selective for the testosterone stimulated AR-dependent prostate cancer cell (LNCaP, GI50=22.4µM) when compared with unstimulated LNCaP or AR-insensitive (DU145 and PC-3) cell lines. Cell cycle arrest study provided the same profile for Neq0502 and the reference drug enzalutamide. Moreover, this compound is not cytotoxic for fibroblast Balb/C 3T3 clone A31 cells up to 250µM, with a good selectivity ratio (SI>11), which could be used in compound optimization effort to a novel therapeutic alternative.