Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
J Cheminform ; 16(1): 15, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38321500

ABSTRACT

Mass spectrometry (MS) is an analytical technique for molecule identification that can be used for investigating protein-metal complex interactions. Once the MS data is collected, the mass spectra are usually interpreted manually to identify the adducts formed as a result of the interactions between proteins and metal-based species. However, with increasing resolution, dataset size, and species complexity, the time required to identify adducts and the error-prone nature of manual assignment have become limiting factors in MS analysis. AdductHunter is a open-source web-based analysis tool that  automates the peak identification process using constraint integer optimization to find feasible combinations of protein and fragments, and dynamic time warping to calculate the dissimilarity between the theoretical isotope pattern of a species and its experimental isotope peak distribution. Empirical evaluation on a collection of 22 unique MS datasetsshows fast and accurate identification of protein-metal complex adducts in deconvoluted mass spectra.

2.
J Cheminform ; 15(1): 53, 2023 May 19.
Article in English | MEDLINE | ID: mdl-37208694

ABSTRACT

BACKGROUND: Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers' experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space. PROPOSED SOLUTION: In this paper, we propose CANCELS (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. CANCELS does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain. RESULTS: An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that CANCELS produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor's performance while reducing the number of required experiments. Overall, we believe that CANCELS can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels .

3.
Pac Symp Biocomput ; 27: 301-312, 2022.
Article in English | MEDLINE | ID: mdl-34890158

ABSTRACT

Influenza is a communicable respiratory illness that can cause serious public health hazards. Due to its huge threat to the community, accurate forecasting of Influenza-like-illness (ILI) can diminish the impact of an influenza season by enabling early public health interventions. Machine learning models are increasingly being applied in infectious disease modelling, but are limited in their performance, particularly when using a longer forecasting window. This paper proposes a novel time series forecasting method, Randomized Ensembles of Auto-regression chains (Reach). Reach implements an ensemble of random chains for multistep time series forecasting. This new approach is evaluated on ILI case counts in Auckland, New Zealand from the years 2015-2018 and compared to other standard methods. The results demonstrate that the proposed method performed better than baseline methods when applied to this ILI time series forecasting problem.


Subject(s)
Influenza, Human , Computational Biology , Forecasting , Humans , Influenza, Human/epidemiology , Regression Analysis , Time Factors
4.
J Cheminform ; 13(1): 63, 2021 Sep 03.
Article in English | MEDLINE | ID: mdl-34479624

ABSTRACT

The prediction of metabolism and biotransformation pathways of xenobiotics is a highly desired tool in environmental sciences, drug discovery, and (eco)toxicology. Several systems predict single transformation steps or complete pathways as series of parallel and subsequent steps. Their performance is commonly evaluated on the level of a single transformation step. Such an approach cannot account for some specific challenges that are caused by specific properties of biotransformation experiments. That is, missing transformation products in the reference data that occur only in low concentrations, e.g. transient intermediates or higher-generation metabolites. Furthermore, some rule-based prediction systems evaluate the performance only based on the defined set of transformation rules. Therefore, the performance of these models cannot be directly compared. In this paper, we introduce a new evaluation framework that extends the evaluation of biotransformation prediction from single transformations to whole pathways, taking into account multiple generations of metabolites. We introduce a procedure to address transient intermediates and propose a weighted scoring system that acknowledges the uncertainty of higher-generation metabolites. We implemented this framework in enviPath and demonstrate its strict performance metrics on predictions of in vitro biotransformation and degradation of xenobiotics in soil. Our approach is model-agnostic and can be transferred to other prediction systems. It is also capable of revealing knowledge gaps in terms of incompletely defined sets of transformation rules.

5.
Comput Biol Med ; 130: 104197, 2021 03.
Article in English | MEDLINE | ID: mdl-33429140

ABSTRACT

Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.


Subject(s)
Machine Learning , Neural Networks, Computer , Drug Design
6.
R Soc Open Sci ; 6(9): 190741, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31598303

ABSTRACT

The link between colour and emotion and its possible similarity across cultures are questions that have not been fully resolved. Online, 711 participants from China, Germany, Greece and the UK associated 12 colour terms with 20 discrete emotion terms in their native languages. We propose a machine learning approach to quantify (a) the consistency and specificity of colour-emotion associations and (b) the degree to which they are country-specific, on the basis of the accuracy of a statistical classifier in (a) decoding the colour term evaluated on a given trial from the 20 ratings of colour-emotion associations and (b) predicting the country of origin from the 240 individual colour-emotion associations, respectively. The classifier accuracies were significantly above chance level, demonstrating that emotion associations are to some extent colour-specific and that colour-emotion associations are to some extent country-specific. A second measure of country-specificity, the in-group advantage of the colour-decoding accuracy, was detectable but relatively small (6.1%), indicating that colour-emotion associations are both universal and culture-specific. Our results show that machine learning is a promising tool when analysing complex datasets from emotion research.

7.
Environ Sci Process Impacts ; 19(3): 449-464, 2017 Mar 22.
Article in English | MEDLINE | ID: mdl-28229138

ABSTRACT

Developing models for the prediction of microbial biotransformation pathways and half-lives of trace organic contaminants in different environments requires as training data easily accessible and sufficiently large collections of respective biotransformation data that are annotated with metadata on study conditions. Here, we present the Eawag-Soil package, a public database that has been developed to contain all freely accessible regulatory data on pesticide degradation in laboratory soil simulation studies for pesticides registered in the EU (282 degradation pathways, 1535 reactions, 1619 compounds and 4716 biotransformation half-life values with corresponding metadata on study conditions). We provide a thorough description of this novel data resource, and discuss important features of the pesticide soil degradation data that are relevant for model development. Most notably, the variability of half-life values for individual compounds is large and only about one order of magnitude lower than the entire range of median half-life values spanned by all compounds, demonstrating the need to consider study conditions in the development of more accurate models for biotransformation prediction. We further show how the data can be used to find missing rules relevant for predicting soil biotransformation pathways. From this analysis, eight examples of reaction types were presented that should trigger the formulation of new biotransformation rules, e.g., Ar-OH methylation, or the extension of existing rules, e.g., hydroxylation in aliphatic rings. The data were also used to exemplarily explore the dependence of half-lives of different amide pesticides on chemical class and experimental parameters. This analysis highlighted the value of considering initial transformation reactions for the development of meaningful quantitative-structure biotransformation relationships (QSBR), which is a novel opportunity offered by the simultaneous encoding of transformation reactions and corresponding half-lives in Eawag-Soil. Overall, Eawag-Soil provides an unprecedentedly rich collection of manually extracted and curated biotransformation data, which should be useful in a great variety of applications.


Subject(s)
Biotransformation , Databases, Factual , Models, Biological , Pesticides/metabolism , Soil Pollutants/metabolism , Biodegradation, Environmental , Half-Life , Pesticides/analysis , Soil , Soil Pollutants/analysis
8.
Sci Rep ; 6: 25464, 2016 05 10.
Article in English | MEDLINE | ID: mdl-27160439

ABSTRACT

Human beings continuously emit chemicals into the air by breath and through the skin. In order to determine whether these emissions vary predictably in response to audiovisual stimuli, we have continuously monitored carbon dioxide and over one hundred volatile organic compounds in a cinema. It was found that many airborne chemicals in cinema air varied distinctively and reproducibly with time for a particular film, even in different screenings to different audiences. Application of scene labels and advanced data mining methods revealed that specific film events, namely "suspense" or "comedy" caused audiences to change their emission of specific chemicals. These event-type synchronous, broadcasted human chemosignals open the possibility for objective and non-invasive assessment of a human group response to stimuli by continuous measurement of chemicals in air. Such methods can be applied to research fields such as psychology and biology, and be valuable to industries such as film making and advertising.


Subject(s)
Air Pollutants/analysis , Air Pollutants/chemistry , Air Pollution, Indoor , Environmental Monitoring , Respiration , Acetone/analysis , Acetone/chemistry , Butadienes/analysis , Butadienes/chemistry , Carbon Dioxide/analysis , Carbon Dioxide/chemistry , Hemiterpenes/analysis , Hemiterpenes/chemistry , Humans , Motion Pictures , Pentanes/analysis , Pentanes/chemistry , Time Factors , Volatile Organic Compounds/analysis , Volatile Organic Compounds/chemistry
9.
Nucleic Acids Res ; 44(D1): D502-8, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26582924

ABSTRACT

The University of Minnesota Biocatalysis/Biodegradation Database and Pathway Prediction System (UM-BBD/PPS) has been a unique resource covering microbial biotransformation pathways of primarily xenobiotic chemicals for over 15 years. This paper introduces the successor system, enviPath (The Environmental Contaminant Biotransformation Pathway Resource), which is a complete redesign and reimplementation of UM-BBD/PPS. enviPath uses the database from the UM-BBD/PPS as a basis, extends the use of this database, and allows users to include their own data to support multiple use cases. Relative reasoning is supported for the refinement of predictions and to allow its extensions in terms of previously published, but not implemented machine learning models. User access is simplified by providing a REST API that simplifies the inclusion of enviPath into existing workflows. An RDF database is used to enable simple integration with other databases. enviPath is publicly available at https://envipath.org with free and open access to its core data.


Subject(s)
Databases, Chemical , Environmental Pollutants/metabolism , Xenobiotics/metabolism , Biocatalysis , Biotransformation , Environmental Pollutants/chemistry , User-Computer Interface , Xenobiotics/chemistry
10.
Bioinformatics ; 26(6): 814-21, 2010 Mar 15.
Article in English | MEDLINE | ID: mdl-20106820

ABSTRACT

MOTIVATION: Current methods for the prediction of biodegradation products and pathways of organic environmental pollutants either do not take into account domain knowledge or do not provide probability estimates. In this article, we propose a hybrid knowledge- and machine learning-based approach to overcome these limitations in the context of the University of Minnesota Pathway Prediction System (UM-PPS). The proposed solution performs relative reasoning in a machine learning framework, and obtains one probability estimate for each biotransformation rule of the system. As the application of a rule then depends on a threshold for the probability estimate, the trade-off between recall (sensitivity) and precision (selectivity) can be addressed and leveraged in practice. RESULTS: Results from leave-one-out cross-validation show that a recall and precision of approximately 0.8 can be achieved for a subset of 13 transformation rules. Therefore, it is possible to optimize precision without compromising recall. We are currently integrating the results into an experimental version of the UM-PPS server. AVAILABILITY: The program is freely available on the web at http://wwwkramer.in.tum.de/research/applications/biodegradation/data. CONTACT: kramer@in.tum.de.


Subject(s)
Artificial Intelligence , Biodegradation, Environmental , Computational Biology/methods , Biotransformation , Databases, Factual
SELECTION OF CITATIONS
SEARCH DETAIL
...