Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Anal Chem ; 96(1): 188-196, 2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38117933

ABSTRACT

1H NMR spectroscopy is a powerful tool for analyzing mixtures including determining the concentrations of individual components. When signals from multiple compounds overlap, this task requires computational solutions. They are typically based on peak-picking and the comparison of obtained peak lists with libraries of individual components. This can fail if peaks are not sufficiently resolved or when peak positions differ between the library and the mixture. In this paper, we present Magnetstein, a quantification algorithm rooted in the optimal transport theory that makes it robust to unexpected frequency shifts and overlapping signals. Thanks to this, Magnetstein can quantitatively analyze difficult spectra with the estimation trueness an order of magnitude higher than that of commercial tools. Furthermore, the method is easier to use than other approaches, having only two parameters with default values applicable to a broad range of experiments and requiring little to no preprocessing of the spectra.

2.
J Cheminform ; 15(1): 6, 2023 Jan 14.
Article in English | MEDLINE | ID: mdl-36641473

ABSTRACT

Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .

3.
Gigascience ; 112022 11 03.
Article in English | MEDLINE | ID: mdl-36329619

ABSTRACT

BACKGROUND: Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography-mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes' elution order). RESULTS: We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. CONCLUSIONS: We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm's source code is available at https://github.com/grzsko/Alignstein.


Subject(s)
Algorithms , Tandem Mass Spectrometry , Chromatography, Liquid/methods , Reproducibility of Results , Software
4.
Commun Med (Lond) ; 2(1): 136, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36352249

ABSTRACT

BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.


We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.

5.
Genome Biol ; 23(1): 128, 2022 06 09.
Article in English | MEDLINE | ID: mdl-35681161

ABSTRACT

Copy number alterations constitute important phenomena in tumor evolution. Whole genome single-cell sequencing gives insight into copy number profiles of individual cells, but is highly noisy. Here, we propose CONET, a probabilistic model for joint inference of the evolutionary tree on copy number events and copy number calling. CONET employs an efficient, regularized MCMC procedure to search the space of possible model structures and parameters. We introduce a range of model priors and penalties for efficient regularization. CONET reveals copy number evolution in two breast cancer samples, and outperforms other methods in tree reconstruction, breakpoint identification and copy number calling.


Subject(s)
DNA Copy Number Variations , Neoplasms , Humans , Neoplasms/genetics , Neoplasms/pathology
6.
Methods ; 203: 584-593, 2022 07.
Article in English | MEDLINE | ID: mdl-35085741

ABSTRACT

After more than one and a half year since the COVID-19 pandemics outbreak the scientific world is constantly trying to understand its dynamics. In this paper of the case fatality rates (CFR) for COVID-19 we study the historic data regarding mortality in Poland during the first six months of pandemic, when no SARS-CoV-2 variants of concern were present among infected. To this end, we apply competing risk models to perform both uni- and multivariate analyses on specific subpopulations selected by different factors including the key indicators: age, sex, hospitalization. The study explores the case fatality rate to find out its decreasing trend in time. Furthermore, we describe the differences in mortality among hospitalized and other cases indicating a sudden increase of mortality among hospitalized cases at the end of the 2020 spring season. Exploratory and multivariate analysis revealed the real impact of each variable and besides the expected factors indicating increased mortality (age, comorbidities) we track more non-obvious indicators. Recent medical care as well as the identification of the source contact, independently of the comorbidities, significantly impact an individual mortality risk. As a result, the study provides a twofold insight into the COVID-19 mortality in Poland. On one hand we explore mortality in different groups with respect to different variables, on the other we indicate novel factors that may be crucial in reducing mortality. The later can be coped, e.g. by more efficient contact tracing and proper organization and management of the health care system to accompany those who need medical care independently of comorbidities or COVID-19 infection.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Contact Tracing , Humans , Pandemics , Poland/epidemiology
7.
R Soc Open Sci ; 8(11): 211279, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34849247

ABSTRACT

From a systems biology perspective, the majority of cancer models, although interesting and providing a qualitative explanation of some problems, have a major disadvantage in that they usually miss a genuine connection with experimental data. Having this in mind, in this paper, we aim at contributing to the improvement of many cancer models which contain a proliferation term. To this end, we propose a new non-local model of cell proliferation. We select data that are suitable to perform Bayesian inference for unknown parameters and we provide a discussion on the range of applicability of the model. Furthermore, we provide proof of the stability of posterior distributions in total variation norm which exploits the theory of spaces of measures equipped with the weighted flat norm. In a companion paper, we provide detailed proof of the well-posedness of the problem and we investigate the convergence of the escalator boxcar train (EBT) algorithm applied to solve the equation.

8.
Rapid Commun Mass Spectrom ; : e8956, 2020 Sep 30.
Article in English | MEDLINE | ID: mdl-32996651

ABSTRACT

RATIONALE: The linear regression of mass spectra is a computational problem defined as fitting a linear combination of reference spectra to an experimental one. It is typically used to estimate the relative quantities of selected ions. In this work, we study this problem in an abstract setting to develop new approaches applicable to a diverse range of experiments. METHODS: To overcome the sensitivity of the ordinary least-squares regression to measurement inaccuracies, we base our methods on a non-conventional spectral dissimilarity measure, known as the Wasserstein or the Earth Mover's distance. This distance is based on the notion of the cost of transporting signal between mass spectra, which renders it naturally robust to measurement inaccuracies in the mass domain. RESULTS: Using a data set of 200 mass spectra, we show that our approach is capable of estimating ion proportions accurately without extensive preprocessing of spectra required by other methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. CONCLUSIONS: We have developed a linear regression algorithm based on the notion of the cost of transporting signal between spectra. Our implementation is available in a Python 3 package called masserstein, which is freely available at https://github.com/mciach/masserstein.

9.
BMC Bioinformatics ; 20(Suppl 15): 644, 2019 Dec 24.
Article in English | MEDLINE | ID: mdl-31874610

ABSTRACT

BACKGROUND: A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. RESULTS: We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard). CONCLUSION: We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.


Subject(s)
Freshwater Biology/methods , Algorithms , Animals , Biometry , Fishes , Probability
10.
Anal Chem ; 91(3): 1801-1807, 2019 02 05.
Article in English | MEDLINE | ID: mdl-30608646

ABSTRACT

Top-down mass spectrometry methods are becoming continuously more popular in the effort to describe the proteome. They rely on the fragmentation of intact protein ions inside the mass spectrometer. Among the existing fragmentation methods, electron transfer dissociation is known for its precision and wide coverage of different cleavage sites. However, several side reactions can occur under electron transfer dissociation (ETD) conditions, including nondissociative electron transfer and proton transfer reaction. Evaluating their extent can provide more insight into reaction kinetics as well as instrument operation. Furthermore, preferential formation of certain reaction products can reveal important structural information. To the best of our knowledge, there are currently no tools capable of tracing and analyzing the products of these reactions in a systematic way. In this Article, we present in detail masstodon: a computer program for assigning peaks and interpreting mass spectra. Besides being a general purpose tool, masstodon also offers the possibility to trace the products of reactions occurring under ETD conditions and provides insights into the parameters driving them. It is available free of charge under the GNU AGPL V3 public license.


Subject(s)
Apolipoprotein A-I/analysis , Mass Spectrometry/statistics & numerical data , Software , Substance P/analysis , Ubiquitin/analysis , Algorithms , Electrons
11.
J Comput Biol ; 25(3): 282-301, 2018 03.
Article in English | MEDLINE | ID: mdl-28945460

ABSTRACT

Electron transfer dissociation (ETD) is a versatile technique used in mass spectrometry for the high-throughput characterization of proteins. It consists of several concurrent reactions triggered by the transfer of an electron from its anion source to sample cations. Transferring an electron causes peptide backbone cleavage while leaving labile post-translational modifications intact. The obtained fragmentation spectra provide valuable information for sequence and structure analyses. In this study, we propose a formal mathematical model of the ETD fragmentation process in the form of a system of stochastic differential equations describing its joint dynamics. Parameters of the model correspond to the rates of occurring reactions. Their estimates for various experimental settings give insight into the dynamics of the ETD process. We estimate the model parameters from the relative quantities of fragmentation products in a given mass spectrum by solving a nonlinear optimization problem. The cost function penalizes for the differences between the analytically derived average number of reaction products and their experimental counterparts. The presented method proves highly robust to noise in silico. Moreover, the model can explain a considerable amount of experimental results for a wide range of instrumentation settings. The implementation of the presented workflow, code-named ETDetective, is freely available under the two-clause BSD license.


Subject(s)
Mass Spectrometry/methods , Algorithms , Animals , Humans , Mass Spectrometry/standards , Peptides/chemistry , Proteolysis , Signal-To-Noise Ratio
12.
J Theor Biol ; 407: 38-50, 2016 10 21.
Article in English | MEDLINE | ID: mdl-27396357

ABSTRACT

The dynamics of the infectious disease transmission are often best understood by taking into account the structure of population with respect to specific features, for example age or immunity level. The practical utility of such models depends on the appropriate calibration with the observed data. Here, we discuss the Bayesian approach to data assimilation in the case of a two-state age-structured model. Such models are frequently used to explore the disease dynamics (i.e. force of infection) based on prevalence data collected at several time points. We demonstrate that, in the case when the explicit solution to the model equation is known, accounting for the data collection process in the Bayesian framework allows us to obtain an unbiased posterior distribution for the parameters determining the force of infection. We further show analytically and through numerical tests that the posterior distribution of these parameters is stable with respect to a cohort approximation (Escalator Boxcar Train) of the solution. Finally, we apply the technique to calibrate the model based on observed sero-prevalence of varicella in Poland.


Subject(s)
Aging/pathology , Chickenpox/epidemiology , Models, Biological , Algorithms , Antibodies, Viral/immunology , Bayes Theorem , Chickenpox/immunology , Child, Preschool , Cohort Studies , Humans , Markov Chains , Monte Carlo Method , Numerical Analysis, Computer-Assisted , Poland/epidemiology , Population Dynamics
13.
PLoS One ; 10(6): e0130411, 2015.
Article in English | MEDLINE | ID: mdl-26121655

ABSTRACT

Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints.


Subject(s)
Genes, Bacterial , Genome, Bacterial , Mutation Rate , Mutation , Algorithms , Amino Acids/chemistry , Borrelia burgdorferi/genetics , Chlamydia muridarum/genetics , Chlamydia trachomatis/genetics , DNA Mutational Analysis , DNA Repair , Escherichia coli/genetics , Evolution, Molecular , Markov Chains , Models, Theoretical , Nucleotides/genetics , Phylogeny , Principal Component Analysis , Rickettsia/genetics , Staphylococcus aureus/genetics , Streptococcus pyogenes/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...