Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 80
Filter
1.
Anal Chem ; 96(23): 9343-9352, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38804718

ABSTRACT

Oligonucleotide therapeutics have emerged as an important class of drugs offering targeted therapeutic strategies that complement traditional modalities, such as monoclonal antibodies and small molecules. Their unique ability to precisely modulate gene expression makes them vital for addressing previously undruggable targets. A critical aspect of developing these therapies is characterizing their molecular composition accurately. This includes determining the monoisotopic mass of oligonucleotides, which is essential for identifying impurities, degradants, and modifications that can affect the drug efficacy and safety. Mass spectrometry (MS) plays a pivotal role in this process, yet the accurate interpretation of complex mass spectra remains challenging, especially for large molecules, where the monoisotopic peak is often undetectable. To address this issue, we have adapted the MIND algorithm, originally developed for top-down proteomics, for use with oligonucleotide data. This adaptation allows for the prediction of monoisotopic mass from the more readily detectable, most-abundant peak mass, enhancing the ability to annotate complex spectra of oligonucleotides. Our comprehensive validation of this modified algorithm on both in silico and real-world oligonucleotide data sets has demonstrated its effectiveness and reliability. To facilitate wider adoption of this advanced analytical technique, we have encapsulated the enhanced MIND algorithm in a user-friendly Shiny application. This online platform simplifies the process of annotating complex oligonucleotide spectra, making advanced mass spectrometry analysis accessible to researchers and drug developers. The application is available at https://valkenborg-lab.shinyapps.io/mind4oligos/.


Subject(s)
Algorithms , Mass Spectrometry , Oligonucleotides , Oligonucleotides/analysis , Mass Spectrometry/methods , Molecular Weight
2.
Anal Chem ; 96(9): 3763-3771, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38373058

ABSTRACT

This study introduces a simplified purification method for analyzing 82Se/78Se isotope ratios in diverse natural samples using hydride generation MC-ICP-MS. Unlike the thiol resin method, which is time-consuming and sensitive to the concentrations of reagents used at individual stages, our proposed alternative is quicker, simpler, and robust. The procedure involves coprecipitation of selenium with iron hydroxide and dissolution in hydrochloric acid. Combining hydride generation and a second cleanup stage achieves sufficient purification for Se isotope ratio measurements. The method is efficient, taking 3-4 h after sample decomposition, utilizing common reagents [HCl, Fe(NO3)3, NH4Cl] without evaporation or clean lab conditions. Results on 82Se/78Se isotope ratios in various matrices are presented, comparing them with literature data. All isotopic results have been subjected to a newly proposed state-of-the-art approach to uncertainty estimation dedicated to isotope ratio measurements. The approach is based on applying Monte Carlo simulations with consideration of different samples' results normalized by the expected value. By doing that, we obtained estimated uncertainty for any Se sample with the influence of particular measurements on the final estimation included. We employ a Monte Carlo simulation-based uncertainty estimation approach for isotope ratio measurements, providing estimated uncertainty for each selenium sample.

3.
Methods ; 224: 1-9, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38295891

ABSTRACT

The Major Histocompatibility Complex (MHC) is a critical element of the vertebrate cellular immune system, responsible for presenting peptides derived from intracellular proteins. MHC-I presentation is pivotal in the immune response and holds considerable potential in the realms of vaccine development and cancer immunotherapy. This study delves into the limitations of current methods and benchmarks for MHC-I presentation. We introduce a novel benchmark designed to assess generalization properties and the reliability of models on unseen MHC molecules and peptides, with a focus on the Human Leukocyte Antigen (HLA)-a specific subset of MHC genes present in humans. Finally, we introduce HLABERT, a pretrained language model that outperforms previous methods significantly on our benchmark and establishes a new state-of-the-art on existing benchmarks.


Subject(s)
Peptides , Proteins , Humans , Reproducibility of Results , Peptides/chemistry , Proteins/metabolism , Major Histocompatibility Complex/genetics , Protein Binding
4.
Anal Chem ; 96(1): 188-196, 2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38117933

ABSTRACT

1H NMR spectroscopy is a powerful tool for analyzing mixtures including determining the concentrations of individual components. When signals from multiple compounds overlap, this task requires computational solutions. They are typically based on peak-picking and the comparison of obtained peak lists with libraries of individual components. This can fail if peaks are not sufficiently resolved or when peak positions differ between the library and the mixture. In this paper, we present Magnetstein, a quantification algorithm rooted in the optimal transport theory that makes it robust to unexpected frequency shifts and overlapping signals. Thanks to this, Magnetstein can quantitatively analyze difficult spectra with the estimation trueness an order of magnitude higher than that of commercial tools. Furthermore, the method is easier to use than other approaches, having only two parameters with default values applicable to a broad range of experiments and requiring little to no preprocessing of the spectra.

5.
Genome Biol ; 24(1): 205, 2023 09 11.
Article in English | MEDLINE | ID: mdl-37697406

ABSTRACT

Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.


Subject(s)
Hominidae , Humans , Animals , Hominidae/genetics , Segmental Duplications, Genomic , Telomere , Genomics , Chromosomes, Human
6.
Pediatr Infect Dis J ; 42(12): 1086-1092, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-37725813

ABSTRACT

BACKGROUND: The children's role in transmitting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the familial settings is uncertain. We aimed to assess how often children were the index cases transmitting SARS-CoV-2 into their households during the Delta wave, and to identify risk factors of children being the index case. METHODS: In this prospective survey study, we collected information regarding household members of SARS-CoV-2-positive children tested in a single tertiary hospital. Some patients were tested with polymerase chain reaction and those samples were typed and classified as Delta or non-Delta variant. We have used the Monte Carlo approach to assess predictors of children being the index case in the household. RESULTS: We surveyed 629 families and 515 of them fulfilled inclusion criteria. The child was the index case in 359 (69.71%) households. Attending childcare facilities in all age groups was positively associated with being the index case in the household [nursery, estimate = 1.456, 95% confidence interval (CI): 1.456-1.457, P < 0.001; kindergarten, estimate = 0.899, 95% CI: 0.898-0.900, P = 0.003; school, estimate = 1.23, 95% CI: 1.229-1.231, P = 0.001]. The same association was present in the subgroup of the families with the predominant Delta variant, but not in the subgroup with the predominant non-Delta variant. CONCLUSIONS: Attending childcare and educational facilities might be a significant predictor of a child being the SARS-CoV-2 index case in their household. Children's role in driving the SARS-CoV-2 pandemic changes in consecutive waves. The Monte Carlo approach can be applied to assess risk factors of infectious agents' spread in future epidemics.


Subject(s)
COVID-19 , Child , Humans , COVID-19/epidemiology , SARS-CoV-2/genetics , Pandemics , Prospective Studies
7.
Front Immunol ; 14: 1021638, 2023.
Article in English | MEDLINE | ID: mdl-37359539

ABSTRACT

Neutrophil extracellular traps (NETs), pathogen-ensnaring structures formed by neutrophils by expelling their DNA into the environment, are believed to play an important role in immunity and autoimmune diseases. In recent years, a growing attention has been put into developing software tools to quantify NETs in fluorescent microscopy images. However, current solutions require large, manually-prepared training data sets, are difficult to use for users without background in computer science, or have limited capabilities. To overcome these problems, we developed Trapalyzer, a computer program for automatic quantification of NETs. Trapalyzer analyzes fluorescent microscopy images of samples double-stained with a cell-permeable and a cell-impermeable dye, such as the popular combination of Hoechst 33342 and SYTOX™ Green. The program is designed with emphasis on software ergonomy and accompanied with step-by-step tutorials to make its use easy and intuitive. The installation and configuration of the software takes less than half an hour for an untrained user. In addition to NETs, Trapalyzer detects, classifies and counts neutrophils at different stages of NET formation, allowing for gaining a greater insight into this process. It is the first tool that makes this possible without large training data sets. At the same time, it attains a precision of classification on par with state-of-the-art machine learning algorithms. As an example application, we show how to use Trapalyzer to study NET release in a neutrophil-bacteria co-culture. Here, after configuration, Trapalyzer processed 121 images and detected and classified 16 000 ROIs in approximately three minutes on a personal computer. The software and usage tutorials are available at https://github.com/Czaki/Trapalyzer.


Subject(s)
Extracellular Traps , Neutrophils , Software , Algorithms , Microscopy, Fluorescence/methods
8.
ACS Omega ; 8(15): 13840-13854, 2023 Apr 18.
Article in English | MEDLINE | ID: mdl-37163139

ABSTRACT

COVID-19, the disease caused by SARS-CoV-2, has been disrupting our lives for more than two years now. SARS-CoV-2 interacts with human proteins to pave its way into the human body, thereby wreaking havoc. Moreover, the mutating variants of the virus that take place in the SARS-CoV-2 genome are also a cause of concern among the masses. Thus, it is very important to understand human-spike protein-protein interactions (PPIs) in order to predict new PPIs and consequently propose drugs for the human proteins in order to fight the virus and its different mutated variants, with the mutations occurring in the spike protein. This fact motivated us to develop a complete pipeline where PPIs and drug-protein interactions can be predicted for human-SARS-CoV-2 interactions. In this regard, initially interacting data sets are collected from the literature, and noninteracting data sets are subsequently created for human-SARS-CoV-2 by considering only spike glycoprotein. On the other hand, for drug-protein interactions both interacting and noninteracting data sets are considered from DrugBank and ChEMBL databases. Thereafter, a model based on a sequence-based feature is used to code the protein sequences of human and spike proteins using the well-known Moran autocorrelation technique, while the drugs are coded using another well-known technique, viz., PaDEL descriptors, to predict new human-spike PPIs and eventually new drug-protein interactions for the top 20 predicted human proteins interacting with the original spike protein and its different mutated variants like Alpha, Beta, Delta, Gamma, and Omicron. Such predictions are carried out by random forest as it is found to perform better than other predictors, providing an accuracy of 90.53% for human-spike PPI and 96.15% for drug-protein interactions. Finally, 40 unique drugs like eicosapentaenoic acid, doxercalciferol, ciclesonide, dexamethasone, methylprednisolone, etc. are identified that target 32 human proteins like ACACA, DST, DYNC1H1, etc.

9.
J Cheminform ; 15(1): 6, 2023 Jan 14.
Article in English | MEDLINE | ID: mdl-36641473

ABSTRACT

Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .

10.
Gigascience ; 112022 11 03.
Article in English | MEDLINE | ID: mdl-36329619

ABSTRACT

BACKGROUND: Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography-mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes' elution order). RESULTS: We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. CONCLUSIONS: We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm's source code is available at https://github.com/grzsko/Alignstein.


Subject(s)
Algorithms , Tandem Mass Spectrometry , Chromatography, Liquid/methods , Reproducibility of Results , Software
11.
Commun Med (Lond) ; 2(1): 136, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36352249

ABSTRACT

BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.


We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.

12.
J Am Soc Mass Spectrom ; 33(11): 2063-2069, 2022 Nov 02.
Article in English | MEDLINE | ID: mdl-36223196

ABSTRACT

Nowadays, monoisotopic mass is used as an important feature in top-down proteomics. Knowing the exact monoisotopic mass is helpful for precise and quick protein identification in large protein databases. However, only in spectra of small molecules the monoisotopic peak is visible. For bigger molecules like proteins, it is hidden in noise or undetected at all, and therefore its position has to be predicted. By improving the prediction of the peak, we contribute to a more accurate identification of molecules, which is crucial in fields such as chemistry and medicine. In this work, we present the envemind algorithm, which is a two-step procedure to predict monoisotopic masses of proteins. The prediction is based on an isotopic envelope. Therefore, envemind is dedicated to spectra where we are able to resolve the one dalton separated isotopic variants. Furthermore, only single-molecule spectra are allowed, that is, spectra that do not require prior deconvolution. The algorithm deals with the problem of off-by-one dalton errors, which are common in monoisotopic mass prediction. A novel aspect of this work is a mathematical exploration of the space of molecules, where we equate chemical formulas and their theoretical spectrum. Since the space of molecules consists of all possible chemical formulas, this approach is not limited to known substances only. This makes optimization processes faster and enables to approximate theoretical spectrum for a given experimental one. The algorithm is available as a Python package envemind on our GitHub page https://github.com/PiotrRadzinski/envemind.


Subject(s)
Proteins , Proteomics , Databases, Protein , Proteins/chemistry , Proteomics/methods , Algorithms
13.
BMC Genomics ; 23(Suppl 6): 616, 2022 Aug 25.
Article in English | MEDLINE | ID: mdl-36008753

ABSTRACT

BACKGROUND: The reduction of the chromosome number from 48 in the Great Apes to 46 in modern humans is thought to result from the end-to-end fusion of two ancestral non-human primate chromosomes forming the human chromosome 2 (HSA2). Genomic signatures of this event are the presence of inverted telomeric repeats at the HSA2 fusion site and a block of degenerate satellite sequences that mark the remnants of the ancestral centromere. It has been estimated that this fusion arose up to 4.5 million years ago (Mya). RESULTS: We have developed an enhanced algorithm for the detection and efficient counting of the locally over-represented weak-to-strong (AT to GC) substitutions. By analyzing the enrichment of these substitutions around the fusion site of HSA2 we estimated its formation time at 0.9 Mya with a 95% confidence interval of 0.4-1.5 Mya. Additionally, based on the statistics derived from our algorithm, we have reconstructed the evolutionary distances among the Great Apes (Hominoidea). CONCLUSIONS: Our results shed light on the HSA2 fusion formation and provide a novel computational alternative for the estimation of the speciation chronology.


Subject(s)
Evolution, Molecular , Hominidae , Animals , Centromere/genetics , Chromosomes, Human , Genome , Hominidae/genetics , Humans
14.
Commun Med (Lond) ; 2: 23, 2022.
Article in English | MEDLINE | ID: mdl-35603303

ABSTRACT

The introduction of COVID-19 vaccination passes (VPs) by many countries coincided with the Delta variant fast becoming dominant across Europe. A thorough assessment of their impact on epidemic dynamics is still lacking. Here, we propose the VAP-SIRS model that considers possibly lower restrictions for the VP holders than for the rest of the population, imperfect vaccination effectiveness against infection, rates of (re-)vaccination and waning immunity, fraction of never-vaccinated, and the increased transmissibility of the Delta variant. Some predicted epidemic scenarios for realistic parameter values yield new COVID-19 infection waves within two years, and high daily case numbers in the endemic state, even without introducing VPs and granting more freedom to their holders. Still, suitable adaptive policies can avoid unfavorable outcomes. While VP holders could initially be allowed more freedom, the lack of full vaccine effectiveness and increased transmissibility will require accelerated (re-)vaccination, wide-spread immunity surveillance, and/or minimal long-term common restrictions.

15.
Nucleic Acids Res ; 50(W1): W744-W752, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35524567

ABSTRACT

In recent years great progress has been made in identification of structural variants (SV) in the human genome. However, the interpretation of SVs, especially located in non-coding DNA, remains challenging. One of the reasons stems in the lack of tools exclusively designed for clinical SVs evaluation acknowledging the 3D chromatin architecture. Therefore, we present TADeus2 a web server dedicated for a quick investigation of chromatin conformation changes, providing a visual framework for the interpretation of SVs affecting topologically associating domains (TADs). This tool provides a convenient visual inspection of SVs, both in a continuous genome view as well as from a rearrangement's breakpoint perspective. Additionally, TADeus2 allows the user to assess the influence of analyzed SVs within flaking coding/non-coding regions based on the Hi-C matrix. Importantly, the SVs pathogenicity is quantified and ranked using TADA, ClassifyCNV tools and sampling-based P-value. TADeus2 is publicly available at https://tadeus2.mimuw.edu.pl.


Subject(s)
Chromatin , DNA , Humans , Chromatin/genetics , Chromosomes , Genome, Human
16.
Methods ; 203: 584-593, 2022 07.
Article in English | MEDLINE | ID: mdl-35085741

ABSTRACT

After more than one and a half year since the COVID-19 pandemics outbreak the scientific world is constantly trying to understand its dynamics. In this paper of the case fatality rates (CFR) for COVID-19 we study the historic data regarding mortality in Poland during the first six months of pandemic, when no SARS-CoV-2 variants of concern were present among infected. To this end, we apply competing risk models to perform both uni- and multivariate analyses on specific subpopulations selected by different factors including the key indicators: age, sex, hospitalization. The study explores the case fatality rate to find out its decreasing trend in time. Furthermore, we describe the differences in mortality among hospitalized and other cases indicating a sudden increase of mortality among hospitalized cases at the end of the 2020 spring season. Exploratory and multivariate analysis revealed the real impact of each variable and besides the expected factors indicating increased mortality (age, comorbidities) we track more non-obvious indicators. Recent medical care as well as the identification of the source contact, independently of the comorbidities, significantly impact an individual mortality risk. As a result, the study provides a twofold insight into the COVID-19 mortality in Poland. On one hand we explore mortality in different groups with respect to different variables, on the other we indicate novel factors that may be crucial in reducing mortality. The later can be coped, e.g. by more efficient contact tracing and proper organization and management of the health care system to accompany those who need medical care independently of comorbidities or COVID-19 infection.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Contact Tracing , Humans , Pandemics , Poland/epidemiology
17.
Sci Rep ; 11(1): 3989, 2021 02 17.
Article in English | MEDLINE | ID: mdl-33597594

ABSTRACT

The polyphenol content and antioxidant capacity of hyperforin and hypericin-standardized H. perforatum L. extracts may vary due to the harvest time. In this work, ethanol and ethanol-water extracts of air-dried and lyophilized flowers of H. perforatum L., collected throughout a vegetation season in central Poland, were studied. Air-dried flowers extracts had higher polyphenol (371 mg GAE/g) and flavonoid (160 mg CAE/g) content, DPPH radical scavenging (1672 mg DPPH/g), ORAC (5214 µmol TE/g) and FRAP (2.54 mmol Fe2+/g) than lyophilized flowers extracts (238 mg GAE/g, 107 mg CAE/g, 1287 mg DPPH/g, 3313 µmol TE/g and 0.31 mmol Fe2+/g, respectively). Principal component analysis showed that the collection date influenced the flavonoid and polyphenol contents and FRAP of ethanol extracts, and DPPH and ORAC values of ethanol-water extracts. The ethanol extracts with the highest polyphenol and flavonoid content protected human erythrocytes against bisphenol A-induced damage. Both high field and benchtop NMR spectra of selected extracts, revealed differences in composition caused by extraction solvent and raw material collection date. Moreover, we have shown that benchtop NMR can be used to detect the compositional variation of extracts if the assignment of signals is done previously.


Subject(s)
Antioxidants/chemistry , Flavonoids/chemistry , Flowers/chemistry , Hypericum/chemistry , Plant Extracts/chemistry , Polyphenols/chemistry , Anthracenes/chemistry , Antioxidants/pharmacology , Benzhydryl Compounds/chemistry , Ethanol/chemistry , Humans , Perylene/analogs & derivatives , Perylene/chemistry , Phenols/chemistry , Phloroglucinol/analogs & derivatives , Phloroglucinol/chemistry , Plant Extracts/pharmacology , Poland , Polyphenols/pharmacology , Principal Component Analysis , Terpenes/chemistry
18.
Entropy (Basel) ; 22(11)2020 Oct 31.
Article in English | MEDLINE | ID: mdl-33287006

ABSTRACT

The constantly and rapidly increasing amount of the biological data gained from many different high-throughput experiments opens up new possibilities for data- and model-driven inference. Yet, alongside, emerges a problem of risks related to data integration techniques. The latter are not so widely taken account of. Especially, the approaches based on the flux balance analysis (FBA) are sensitive to the structure of a metabolic network for which the low-entropy clusters can prevent the inference from the activity of the metabolic reactions. In the following article, we set forth problems that may arise during the integration of metabolomic data with gene expression datasets. We analyze common pitfalls, provide their possible solutions, and exemplify them by a case study of the renal cell carcinoma (RCC). Using the proposed approach we provide a metabolic description of the known morphological RCC subtypes and suggest a possible existence of the poor-prognosis cluster of patients, which are commonly characterized by the low activity of the drug transporting enzymes crucial in the chemotherapy. This discovery suits and extends the already known poor-prognosis characteristics of RCC. Finally, the goal of this work is also to point out the problem that arises from the integration of high-throughput data with the inherently nonuniform, manually curated low-throughput data. In such cases, the over-represented information may potentially overshadow the non-trivial discoveries.

19.
Rapid Commun Mass Spectrom ; : e8956, 2020 Sep 30.
Article in English | MEDLINE | ID: mdl-32996651

ABSTRACT

RATIONALE: The linear regression of mass spectra is a computational problem defined as fitting a linear combination of reference spectra to an experimental one. It is typically used to estimate the relative quantities of selected ions. In this work, we study this problem in an abstract setting to develop new approaches applicable to a diverse range of experiments. METHODS: To overcome the sensitivity of the ordinary least-squares regression to measurement inaccuracies, we base our methods on a non-conventional spectral dissimilarity measure, known as the Wasserstein or the Earth Mover's distance. This distance is based on the notion of the cost of transporting signal between mass spectra, which renders it naturally robust to measurement inaccuracies in the mass domain. RESULTS: Using a data set of 200 mass spectra, we show that our approach is capable of estimating ion proportions accurately without extensive preprocessing of spectra required by other methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. CONCLUSIONS: We have developed a linear regression algorithm based on the notion of the cost of transporting signal between spectra. Our implementation is available in a Python 3 package called masserstein, which is freely available at https://github.com/mciach/masserstein.

20.
J Clin Med ; 9(5)2020 Apr 25.
Article in English | MEDLINE | ID: mdl-32344861

ABSTRACT

De novo balanced chromosomal aberrations (BCAs), such as reciprocal translocations and inversions, are genomic aberrations that, in approximately 25% of cases, affect the human phenotype. Delineation of the exact structure of BCAs may provide a precise diagnosis and/or point to new disease loci. We report on six patients with de novo balanced chromosomal translocations (BCTs) and one patient with a de novo inversion, in whom we mapped breakpoints to a resolution of 1 bp, using shallow whole-genome mate pair sequencing. In all seven cases, a disruption of at least one gene was found. In two patients, the phenotypic impact of the disrupted genes is well known (NFIA, ATP7A). In five patients, the aberration damaged genes: PARD3, EPHA6, KLF13, STK24, UBR3, MLLT10 and TLE3, whose influence on the human phenotype is poorly understood. In particular, our results suggest novel candidate genes for retinal degeneration with anophthalmia (EPHA6), developmental delay with speech impairment (KLF13), and developmental delay with brain dysembryoplastic neuroepithelial tumor (UBR3). In conclusion, identification of the exact structure of symptomatic BCTs using next generation sequencing is a viable method for both diagnosis and finding novel disease candidate genes in humans.

SELECTION OF CITATIONS
SEARCH DETAIL
...