Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 127
Filtrar
1.
Metabolites ; 14(2)2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38393009

RESUMO

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using a CWL description on WorkflowHub. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use this narrative discussion as a guideline to commence using FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.

2.
Crit Rev Microbiol ; : 1-40, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38270170

RESUMO

Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.


Computer-assisted biosynthetic capacities elucidation accelerates our ability to interpret microbial interactions, allowing us to understand better and establish a balance within ecosystems.

3.
Magn Reson Chem ; 62(2): 74-83, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38112483

RESUMO

In October 2003, 20 years ago, the open-source and open-content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer-running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt for similar projects.

4.
Front Microbiol ; 14: 1295994, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38116530

RESUMO

Diatoms (Bacillariophyceae) are aquatic photosynthetic microalgae with an ecological role as primary producers in the aquatic food web. They account substantially for global carbon, nitrogen, and silicon cycling. Elucidating the chemical space of diatoms is crucial to understanding their physiology and ecology. To expand the known chemical space of a cosmopolitan marine diatom, Skeletonema marinoi, we performed High-Resolution Liquid Chromatography-Tandem Mass Spectrometry (LC-MS2) for untargeted metabolomics data acquisition. The spectral data from LC-MS2 was used as input for the Metabolome Annotation Workflow (MAW) to obtain putative annotations for all measured features. A suspect list of metabolites previously identified in the Skeletonema spp. was generated to verify the results. These known metabolites were then added to the putative candidate list from LC-MS2 data to represent an expanded catalog of 1970 metabolites estimated to be produced by S. marinoi. The most prevalent chemical superclasses, based on the ChemONT ontology in this expanded dataset, were organic acids and derivatives, organoheterocyclic compounds, lipids and lipid-like molecules, and organic oxygen compounds. The metabolic profile from this study can aid the bioprospecting of marine microalgae for medicine, biofuel production, agriculture, and environmental conservation. The proposed analysis can be applicable for assessing the chemical space of other microalgae, which can also provide molecular insights into the interaction between marine organisms and their role in the functioning of ecosystems.

5.
J Cheminform ; 15(1): 98, 2023 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-37845745

RESUMO

In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution provides a unified interface for accessing commonly used functionalities of multiple cheminformatics toolkits, namely RDKit, Chemistry Development Kit (CDK), and Open Babel. In addition, more advanced functionalities like structure generation and Optical Chemical Structure Recognition (OCSR) are made available through the Cheminformatics Microservice based on pre-existing tools. The software service also enables developers to extend the functionalities easily and to seamlessly integrate them with existing workflows and applications. It is built on FastAPI and containerized using Docker, making it highly scalable. An instance of the microservice is publicly available at https://api.naturalproducts.net . The source code is publicly accessible on GitHub, accompanied by comprehensive documentation, version control, and continuous integration and deployment workflows. All resources can be found at the following link: https://github.com/Steinbeck-Lab/cheminformatics-microservice .

7.
Nat Commun ; 14(1): 5045, 2023 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-37598180

RESUMO

The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervention - especially the mining of chemical structure depictions. As an open-source platform that leverages recent advancements in deep learning, computer vision, and natural language processing, DECIMER.ai (Deep lEarning for Chemical IMagE Recognition) strives to automatically segment, classify, and translate chemical structure depictions from the printed literature. The segmentation and classification tools are the only openly available packages of their kind, and the optical chemical structure recognition (OCSR) core application yields outstanding performance on all benchmark datasets. The source code, the trained models and the datasets developed in this work have been published under permissive licences. An instance of the DECIMER web application is available at https://decimer.ai .

8.
J Cheminform ; 15(1): 32, 2023 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-36871033

RESUMO

Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC-MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub ( https://github.com/zmahnoor14/MAW ). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.

9.
Molecules ; 28(3)2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36771127

RESUMO

The structure elucidation of small organic molecules (<1500 Dalton) through 1D and 2D nuclear magnetic resonance (NMR) data analysis is a potentially challenging, combinatorial problem. This publication presents Sherlock, a free and open-source Computer-Assisted Structure Elucidation (CASE) software where the user controls the chain of elementary operations through a versatile graphical user interface, including spectral peak picking, addition of automatically or user-defined structure constraints, structure generation, ranking and display of the solutions. A set of forty-five compounds was selected in order to illustrate the new possibilities offered to organic chemists by Sherlock for improving the reliability and traceability of structure elucidation results.

10.
Curr Opin Struct Biol ; 79: 102542, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36805192

RESUMO

Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. There has been a growing interest in applying deep learning to several subfields, including the digital transformation of synthetic chemistry, extraction of chemical information from the scientific literature, and AI in natural product-based drug discovery. The application of AI to molecular informatics is still constrained by the fact that most of the data used for training and testing deep learning models are not available as FAIR and open data. As open science practices continue to grow in popularity, initiatives which support FAIR and open data as well as open-source software have emerged. It is becoming increasingly important for researchers in the field of molecular informatics to embrace open science and to submit data and software in open repositories. With the advent of open-source deep learning frameworks and cloud computing platforms, academic researchers are now able to deploy and test their own deep learning models with ease. With the development of new and faster hardware for deep learning and the increasing number of initiatives towards digital research data management infrastructures, as well as a culture promoting open data, open source, and open science, AI-driven molecular informatics will continue to grow. This review examines the current state of open data and open algorithms in molecular informatics, as well as ways in which they could be improved in future.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Algoritmos , Software , Informática
11.
J Cheminform ; 15(1): 23, 2023 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-36803857

RESUMO

The influence of molecular fragmentation and parameter settings on a mesoscopic dissipative particle dynamics (DPD) simulation of lamellar bilayer formation for a C10E4/water mixture is studied. A "bottom-up" decomposition of C10E4 into the smallest fragment molecules (particles) that satisfy chemical intuition leads to convincing simulation results which agree with experimental findings for bilayer formation and thickness. For integration of the equations of motion Shardlow's S1 scheme proves to be a favorable choice with best overall performance. Increasing the integration time steps above the common setting of 0.04 DPD units leads to increasingly unphysical temperature drifts, but also to increasingly rapid formation of bilayer superstructures without significantly distorted particle distributions up to an integration time step of 0.12. A scaling of the mutual particle-particle repulsions that guide the dynamics has negligible influence within a considerable range of values but exhibits apparent lower thresholds beyond which a simulation fails. Repulsion parameter scaling and molecular particle decomposition show a mutual dependence. For mapping of concentrations to molecule numbers in the simulation box particle volume scaling should be taken into account. A repulsion parameter morphing investigation suggests to not overstretch repulsion parameter accuracy considerations.

12.
J Cheminform ; 15(1): 1, 2023 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-36593523

RESUMO

Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computational workflow with data import, fragmentation algorithm integration, and result visualisation. The described workflow is normally unavailable for a new algorithm and must be set up individually. This work presents an open Java rich client Graphical User Interface (GUI) application to support the development of new in silico molecule fragmentation algorithms and make them readily available upon release. The MORTAR (MOlecule fRagmenTAtion fRamework) application visualises fragmentation results of a set of molecules in various ways and provides basic analysis features. Fragmentation algorithms can be integrated and developed within MORTAR by using a specific wrapper class. In addition, fragmentation pipelines with any combination of the available fragmentation methods can be executed. Upon release, three fragmentation algorithms are already integrated: ErtlFunctionalGroupsFinder, Sugar Removal Utility, and Scaffold Generator. These algorithms, as well as all cheminformatics functionalities in MORTAR, are implemented based on the Chemistry Development Kit (CDK).

13.
J Cheminform ; 14(1): 85, 2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36510332

RESUMO

Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues .

14.
Angew Chem Int Ed Engl ; 61(51): e202203038, 2022 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-36347644

RESUMO

Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences. Many funders require RDM because experiments are often paid for by taxpayers and the resulting data should be deposited sustainably for posterity. However, paper notebooks are still common in laboratories and research data is often stored in proprietary and/or dead-end file formats without experimental context. Data must mature beyond a mere supplement to a research paper. Electronic lab notebooks (ELN) and laboratory information management systems (LIMS) allow researchers to manage data better and they simplify research and publication. Thus, an agreement is needed on minimum information standards for data handling to support structured approaches to data reporting. As digitalization becomes part of curricular teaching, future generations of digital native chemists will embrace RDM and ELN as an organic part of their research.


Assuntos
Gerenciamento de Dados , Laboratórios
15.
J Cheminform ; 14(1): 79, 2022 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-36357931

RESUMO

The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT (COlleCtion of Open Natural prodUcTs) database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.

16.
J Cheminform ; 14(1): 36, 2022 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-35681226

RESUMO

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available. Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

17.
J Cheminform ; 14(1): 31, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35668480

RESUMO

The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems.

18.
Membranes (Basel) ; 12(6)2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35736327

RESUMO

Different charge treatment approaches are examined for cyclotide-induced plasma membrane disruption by lipid extraction studied with dissipative particle dynamics. A pure Coulomb approach with truncated forces tuned to avoid individual strong ion pairing still reveals hidden statistical pairing effects that may lead to artificial membrane stabilization or distortion of cyclotide activity depending on the cyclotide's charge state. While qualitative behavior is not affected in an apparent manner, more sensitive quantitative evaluations can be systematically biased. The findings suggest a charge smearing of point charges by an adequate charge distribution. For large mesoscopic simulation boxes, approximations for the Ewald sum to account for mirror charges due to periodic boundary conditions are of negligible influence.

19.
Elife ; 112022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35616633

RESUMO

Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.


Assuntos
Produtos Biológicos , Gestão do Conhecimento , Biologia Computacional , Bases de Dados Factuais , Conhecimento
20.
J Cheminform ; 14(1): 24, 2022 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-35461261

RESUMO

Chemical structure generators are used in cheminformatics to produce or enumerate virtual molecules based on a set of boundary conditions. The result can then be tested for properties of interest, such as adherence to measured data or for their suitability as drugs. The starting point can be a potentially fuzzy set of fragments or a molecular formula. In the latter case, the generator produces the set of constitutional isomers of the given input formula. Here we present the novel constitutional isomer generator surge based on the canonical generation path method. Surge uses the nauty package to compute automorphism groups of graphs. We outline the working principles of surge and present benchmarking results which show that surge is currently the fastest structure generator. Surge is available under a liberal open-source license.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...