Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.777
Filter
1.
Database (Oxford) ; 20242024 May 22.
Article in English | MEDLINE | ID: mdl-38776380

ABSTRACT

Natural products play a pivotal role in drug discovery, and the richness of natural products, albeit significantly influenced by various environmental factors, is predominantly determined by intrinsic genetics of a series of enzymatic reactions and produced as secondary metabolites of organisms. Heretofore, few natural product-related databases take the chemical content into consideration as a prominent property. To gain unique insights into the quantitative diversity of natural products, we have developed the first TerPenoids database embedded with Content information (TPCN) with features such as compound browsing, structural search, scaffold analysis, similarity analysis and data download. This database can be accessed through a web-based computational toolkit available at http://www.tpcn.pro/. By conducting meticulous manual searches and analyzing over 10 000 reference papers, the TPCN database has successfully integrated 6383 terpenoids obtained from 1254 distinct plant species. The database encompasses exhaustive details including isolation parts, comprehensive molecule structures, chemical abstracts service registry number (CAS number) and 7508 content descriptions. The TPCN database accentuates both the qualitative and quantitative dimensions as invaluable phenotypic characteristics of natural products that have undergone genetic evolution. By acting as an indispensable criterion, the TPCN database facilitates the discovery of drug alternatives with high content and the selection of high-yield medicinal plant species or phylogenetic alternatives, thereby fostering sustainable, cost-effective and environmentally friendly drug discovery in pharmaceutical farming. Database URL: http://www.tpcn.pro/.


Subject(s)
Terpenes , Terpenes/metabolism , Terpenes/chemistry , Databases, Chemical , Databases, Factual
2.
Toxicol In Vitro ; 98: 105851, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38789065

ABSTRACT

After EU ban on animal testing for cosmetics in 2013, there has been an increasing global interest in alternatives test methods. To development for alternatives test method, we need to get the toxic data about in vitro and in vivo of chemicals. However, database sometimes provide limited in vivo and in vitro data on chemicals. Further, the data generated using the OECD TG439 (in vitro skin irritation) are scattered in difference databases, and it is not easy to navigate through them. Therefore, we complied 'Reference Chemical Database System for Skin Irritation Alternative Test (RCDS-Skin Irritation)' to allow easy, one-stop access to test chemical information. We established the systematic RCDS-Skin Irritation by collecting physiochemical properties, CAS number, human data, and in vivo (OECD TG404) data from overseas chemicals database including European Chemicals Agency (ECHA) etc., and in vitro data using Reconstructed human Epidermis (RhE) (OECD TG439). As a result, we developed the RCDS-Skin Irritation that contains information on 149 chemicals including the data we generated by performing tests using EpiDerm™ SIT, SkinEthic™ RHE and KeraSkin™ SIT. Therefore, the RCDS-Skin Irritation established based on our study will provide insight for safety assessment of chemicals and for development of alternative test methods.


Subject(s)
Animal Testing Alternatives , Irritants , Skin Irritancy Tests , Humans , Irritants/toxicity , Skin Irritancy Tests/methods , Databases, Factual , Epidermis/drug effects , Databases, Chemical , Skin/drug effects
3.
J Chem Inf Model ; 64(9): 3790-3798, 2024 May 13.
Article in English | MEDLINE | ID: mdl-38648077

ABSTRACT

Machine learning has the potential to provide tremendous value to life sciences by providing models that aid in the discovery of new molecules and reduce the time for new products to come to market. Chemical reactions play a significant role in these fields, but there is a lack of high-quality open-source chemical reaction data sets for training machine learning models. Herein, we present ORDerly, an open-source Python package for the customizable and reproducible preparation of reaction data stored in accordance with the increasingly popular Open Reaction Database (ORD) schema. We use ORDerly to clean United States patent data stored in ORD and generate data sets for forward prediction, retrosynthesis, as well as the first benchmark for reaction condition prediction. We train neural networks on data sets generated with ORDerly for condition prediction and show that data sets missing key cleaning steps can lead to silently overinflated performance metrics. Additionally, we train transformers for forward and retrosynthesis prediction and demonstrate how non-patent data can be used to evaluate model generalization. By providing a customizable open-source solution for cleaning and preparing large chemical reaction data, ORDerly is poised to push forward the boundaries of machine learning applications in chemistry.


Subject(s)
Benchmarking , Machine Learning , Neural Networks, Computer , Databases, Chemical
4.
Comput Biol Chem ; 110: 108057, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38581840

ABSTRACT

Virtual screening-based molecular similarity and fingerprint are crucial in drug design, target prediction, and ADMET prediction, aiding in identifying potential hits and optimizing lead compounds. However, challenges such as lack of comprehensive open-source molecular fingerprint databases and efficient search methods for virtual screening are prevalent. To address these issues, we introduce FaissMolLib, an open-source virtual screening tool that integrates 2.8 million compounds from ChEMBL and ZINC databases. Notably, FaissMolLib employs the highly efficient Faiss search algorithm, outperforming the Tanimoto algorithm in identifying similar molecules with its tighter clustering in scatter plots and lower mean, standard deviation, and variance in key molecular properties. This feature enables FaissMolLib to screen 2.8 million compounds in just 0.05 seconds, offering researchers an efficient, easily deployable solution for virtual screening on laptops and building unique compound databases. This significant advancement holds great potential for accelerating drug discovery efforts and enhancing chemical data analysis. FaissMolLib is freely available at http://liuhaihan.gnway.cc:80. The code and dataset of FaissMolLib are freely available at https://github.com/Superhaihan/FiassMolLib.


Subject(s)
Algorithms , Ligands , Drug Evaluation, Preclinical/methods , Software , Databases, Chemical , Drug Discovery , Molecular Structure
5.
Talanta ; 275: 126116, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-38640518

ABSTRACT

Fragmentation characteristics are crucial for nontargeted screening to discover and identify unknown exogenous chemical residues in animal-derived foods. In this study, first, fragmentation characteristics of 51 classes of exogenous chemical residues were summarized based on experimental mass spectra of standards in reversed-phase and hydrophilic interaction liquid chromatography-high-resolution mass spectrometry (MS) and mass spectra from the MassBank of North America (MoNA) library. According to the proportion of fragmentation characteristics to the total number of chemical residues in each class, four screening levels were defined to classify 51 classes of chemical residues. Then, a nontargeted screening method was developed based on the fragmentation characteristics. The evaluation results of 82 standards indicated that more than 90 % of the chemical residues with MS/MS spectra can be identified at concentrations of 100 and 500 µg/kg, and about 80 % can be identified at 10 µg/kg. Finally, the nontargeted screening method was applied to 16 meat samples and 21 egg samples as examples. As a result, eight chemical residues and transformation products (TPs) of 5 classes in the exemplary samples were found and identified, in which 3 TPs of azithromycin were identified by fragmentation characteristics-assisted structure interpretation. The results demonstrated the practicability of the nontargeted screening method for routine risk screening of food safety.


Subject(s)
Food Analysis , Hazardous Substances , Liquid Chromatography-Mass Spectrometry , Food Analysis/instrumentation , Food Analysis/methods , Food Analysis/standards , Eggs/analysis , Meat/analysis , Food Safety , Databases, Chemical , Hazardous Substances/analysis , Agrochemicals/analysis , Molecular Structure , Animals
6.
Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.
Article in English | MEDLINE | ID: mdl-38471500

ABSTRACT

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.


Subject(s)
Bile Acids and Salts , Gastrointestinal Microbiome , Metabolomics , Tandem Mass Spectrometry , Animals , Humans , Bile Acids and Salts/chemistry , Metabolomics/methods , Polyamines , Tandem Mass Spectrometry/methods , Databases, Chemical
7.
J Chem Inf Model ; 64(6): 1975-1983, 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38483315

ABSTRACT

Most online chemical reaction databases are not publicly accessible or are fully downloadable. These databases tend to contain reactions in noncanonicalized formats and often lack comprehensive information regarding reaction pathways, intermediates, and byproducts. Within the few publicly available databases, reactions are typically stored in the form of unbalanced, overall transformations with minimal interpretability of the underlying chemistry. These limitations present significant obstacles to data-driven applications including the development of machine learning models. As an effort to overcome these challenges, we introduce PMechDB, a publicly accessible platform designed to curate, aggregate, and share polar chemical reaction data in the form of elementary reaction steps. Our initial version of PMechDB consists of over 100,000 such steps. In the PMechDB, all reactions are stored as canonicalized and balanced elementary steps, featuring accurate atom mapping and arrow-pushing mechanisms. As an online interactive database, PMechDB provides multiple interfaces that enable users to search, download, and upload chemical reactions. We anticipate that the public availability of PMechDB and its standardized data representation will prove beneficial for chemoinformatics research and education and the development of data-driven, interpretable models for predicting reactions and pathways. PMechDB platform is accessible online at https://deeprxn.ics.uci.edu/pmechdb.


Subject(s)
Databases, Chemical , Databases, Factual
8.
J Chem Inf Model ; 64(8): 2948-2954, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38488634

ABSTRACT

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.


Subject(s)
Algorithms , Software , Programming Languages , Cheminformatics/methods , Databases, Chemical
9.
Methods ; 225: 44-51, 2024 May.
Article in English | MEDLINE | ID: mdl-38518843

ABSTRACT

The process of virtual screening relies heavily on the databases, but it is disadvantageous to conduct virtual screening based on commercial databases with patent-protected compounds, high compound toxicity and side effects. Therefore, this paper utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells to learn the properties of drug compounds in the DrugBank, aiming to obtain a new and virtual screening compounds database with drug-like properties. Ultimately, a compounds database consisting of 26,316 compounds is obtained by this method. To evaluate the potential of this compounds database, a series of tests are performed, including chemical space, ADME properties, compound fragmentation, and synthesizability analysis. As a result, it is proved that the database is equipped with good drug-like properties and a relatively new backbone, its potential in virtual screening is further tested. Finally, a series of seedling compounds with completely new backbones are obtained through docking and binding free energy calculations.


Subject(s)
Deep Learning , Molecular Docking Simulation , Molecular Docking Simulation/methods , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/pharmacology , Drug Evaluation, Preclinical/methods , Humans , Databases, Pharmaceutical , Neural Networks, Computer , Databases, Chemical
11.
J Chem Inf Model ; 64(4): 1158-1171, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38316125

ABSTRACT

Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.


Subject(s)
Databases, Chemical , Drug Discovery , Drug Discovery/methods
12.
SLAS Discov ; 29(2): 100144, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38316342

ABSTRACT

The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.


Subject(s)
Algorithms , Neural Networks, Computer , Solubility , Consensus , Databases, Chemical
13.
BMC Complement Med Ther ; 24(1): 40, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38229051

ABSTRACT

BACKGROUND: As chromatographic techniques have advanced, many articles that analyze the constituting compounds of medicinal materials have been published in relation to Northeast Asian traditional medicine, including traditional Chinese medicine (TCM). TM-MC was launched in 2015, providing information about the chemical compounds in medicinal materials from chromatographic articles in PubMed. Since 2015, through continuous curation efforts, we have now released TM-MC 2.0 with significant improvements to the quantity and quality of the data ( https://tm-mc.kr ). DESCRIPTION: TM-MC 2.0 contains 635 medicinal materials, 34,107 chemical compounds (21,306 identified and de-duplicated), 13,992 targets, 27,997 diseases, and 5,075 prescriptions (2,393 de-duplicated by name). The database provides the largest number of identified compounds for medicinal materials listed in the pharmacopoeia compared to all TCM databases. In particular, marker compounds of medicinal materials and many newly discovered compounds were added through the manual curation of recent chromatographic articles. CONCLUSION: TM-MC 2.0 provides the largest collection of information about the chemical compounds of the medicinal materials listed in the Korean, Chinese, and Japanese pharmacopoeias. Our database can be utilized for network pharmacology in traditional medicine and for the compound screening of medicinal materials for modern drug discovery.


Subject(s)
Databases, Chemical , Medicine, Chinese Traditional , Medicine, Traditional , Databases, Factual
14.
J Biol Chem ; 300(2): 105624, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38176651

ABSTRACT

The glycosylation of proteins and lipids is known to be closely related to the mechanisms of various diseases such as influenza, cancer, and muscular dystrophy. Therefore, it has become clear that the analysis of post-translational modifications of proteins, including glycosylation, is important to accurately understand the functions of each protein molecule and the interactions among them. In order to conduct large-scale analyses more efficiently, it is essential to promote the accumulation, sharing, and reuse of experimental and analytical data in accordance with the FAIR (Findability, Accessibility, Interoperability, and Re-usability) data principles. However, a FAIR data repository for storing and sharing glycoconjugate information, including glycopeptides and glycoproteins, in a standardized format did not exist. Therefore, we have developed GlyComb (https://glycomb.glycosmos.org) as a new standardized data repository for glycoconjugate data. Currently, GlyComb can assign a unique identifier to a set of glycosylation information associated with a specific peptide sequence or UniProt ID. By standardizing glycoconjugate data via GlyComb identifiers and coordinating with existing web resources such as GlyTouCan and GlycoPOST, a comprehensive system for data submission and data sharing among researchers can be established. Here we introduce how GlyComb is able to integrate the variety of glycoconjugate data already registered in existing data repositories to obtain a better understanding of the available glycopeptides and glycoproteins, and their glycosylation patterns. We also explain how this system can serve as a foundation for a better understanding of glycan function.


Subject(s)
Databases, Chemical , Glycomics , Proteomics , Glycopeptides/metabolism , Glycoproteins/metabolism , Glycosylation , Polysaccharides/metabolism , Databases, Genetic
15.
Nucleic Acids Res ; 52(D1): D1355-D1364, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37930837

ABSTRACT

The metabolic roadmap of drugs (MRD) is a comprehensive atlas for understanding the stepwise and sequential metabolism of certain drug in living organisms. It plays a vital role in lead optimization, personalized medication, and ADMET research. The MRD consists of three main components: (i) the sequential catalyses of drug and its metabolites by different drug-metabolizing enzymes (DMEs), (ii) a comprehensive collection of metabolic reactions along the entire MRD and (iii) a systematic description on efficacy & toxicity for all metabolites of a studied drug. However, there is no database available for describing the comprehensive metabolic roadmaps of drugs. Therefore, in this study, a major update of INTEDE was conducted, which provided the stepwise & sequential metabolic roadmaps for a total of 4701 drugs, and a total of 22 165 metabolic reactions containing 1088 DMEs and 18 882 drug metabolites. Additionally, the INTEDE 2.0 labeled the pharmacological properties (pharmacological activity or toxicity) of metabolites and provided their structural information. Furthermore, 3717 drug metabolism relationships were supplemented (from 7338 to 11 055). All in all, INTEDE 2.0 is highly expected to attract broad interests from related research community and serve as an essential supplement to existing pharmaceutical/biological/chemical databases. INTEDE 2.0 can now be accessible freely without any login requirement at: http://idrblab.org/intede/.


Subject(s)
Databases, Chemical , Databases, Factual , Inactivation, Metabolic , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism
20.
J Chromatogr A ; 1710: 464417, 2023 Nov 08.
Article in English | MEDLINE | ID: mdl-37778098

ABSTRACT

Liquid chromatography-tandem with high-resolution mass spectrometry (LCHRMS) has proven challenging for annotating multiple small molecules within complex matrices due to the complexities of chemical structure and raw LCHRMS data, as well as limitations in previous literatures and reference spectra related to those molecules. In this study, we developed a molecular networking assisted automatic database screening (MN/auto-DBS) strategy to examine the combined effect of MS1 exact mass screening and MS2 similarity analysis. We compiled all previously reported compounds from the relevant literatures. With the development of a Python software, the in-house database (DB) was created by automatically calculating the m/z and data from experimental MS1 hits were rapid screened with DB. We then performed a feature-based molecular network analysis on the auto-MS2 data for supplementary identification of unreported compounds, including clustered FBMN and annotated GNPS compounds. Finally, the results from both strategies were merged and manually curated for correct structural assignment. To demonstrate the applicability of MN/auto-DBS, we selected the Huangqi-Danshen herb pair (HD), commonly used in prescriptions or patent medicines to treat diabetic nephropathy and cerebrovascular disease. A total of 223 compounds were annotated, including 65 molecules not previously reported in HD, such as aromatic polyketides, coumarins, and diarylheptanoids. Using MN/auto-DBS, we can profile and mine a wide range of complex matrices for potentially new compounds.


Subject(s)
Software , Tandem Mass Spectrometry , Tandem Mass Spectrometry/methods , Chromatography, Liquid , Databases, Chemical , Databases, Factual , Chromatography, High Pressure Liquid/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...