RESUMO
Direct-to-Mass Spectrometry and ambient ionization techniques can be used for biochemical fingerprinting in a fast way. Data processing is typically accomplished with vendor-provided software tools. Here, a novel, open-source functionality, entitled Tidy-Direct-to-MS, was developed for data processing of direct-to-MS data sets. It allows for fast and user-friendly processing using different modules for optional sample position detection and separation, mass-to-charge ratio drift detection and correction, consensus spectra calculation, and bracketing across sample positions as well as feature abundance calculation. The tool also provides functionality for the automated comparison of different sets of parameters, thereby assisting the user in the complex task of finding an optimal combination to maximize the total number of detected features while also checking for the detection of user-provided reference features. In addition, Tidy-Direct-to-MS has the capability for data quality review and subsequent data analysis, thereby simplifying the workflow of untargeted ambient MS-based metabolomics studies. Tidy-Direct-to-MS is implemented in the Python programming language as part of the TidyMS library and can thus be easily extended. Capabilities of Tidy-Direct-to-MS are showcased in a data set acquired in a marine metabolomics study reported in MetaboLights (MTBLS1198) using a transmission mode Direct Analysis in Real Time-Mass Spectrometry (TM-DART-MS)-based method.
Assuntos
Espectrometria de Massas , Metabolômica , Software , Metabolômica/métodos , Espectrometria de Massas/métodos , Linguagens de ProgramaçãoRESUMO
INTRODUCTION: There is still no community consensus regarding strategies for data quality review in liquid chromatography mass spectrometry (LC-MS)-based untargeted metabolomics. Assessing the analytical robustness of data, which is relevant for inter-laboratory comparisons and reproducibility, remains a challenge despite the wide variety of tools available for data processing. OBJECTIVES: The aim of this study was to provide a model to describe the sources of variation in LC-MS-based untargeted metabolomics measurements, to use it to build a comprehensive curation pipeline, and to provide quality assessment tools for data quality review. METHODS: Human serum samples (n=392) were analyzed by ultraperformance liquid chromatography coupled to high-resolution mass spectrometry (UPLC-HRMS) using an untargeted metabolomics approach. The pipeline and tools used to process this dataset were implemented as part of the open source, publicly available TidyMS Python-based package. RESULTS: The model was applied to understand data curation practices used by the metabolomics community. Sources of variation, which are often overlooked in untargeted metabolomic studies, were identified in the analysis. New tools were used to characterize certain types of variations. CONCLUSION: The developed pipeline allowed confirming data robustness by comparing the experimental results with expected values predicted by the model. New quality control practices were introduced to assess the analytical quality of data.
Assuntos
Curadoria de Dados , Metabolômica , Humanos , Cromatografia Líquida , Reprodutibilidade dos Testes , Espectrometria de Massas em TandemRESUMO
Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography-mass spectrometry (LC-MS) involves the removal of biologically non-relevant features (retention time, m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC-MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC-MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC-MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.
RESUMO
NMR-based metabolomics requires proper identification of metabolites to draw conclusions from the system under study. Normally, multivariate data analysis is performed using 1D 1H NMR spectra, and identification of peaks (and then compounds) relevant to the classification is accomplished using database queries as a first step. 1D 1H NMR spectra of complex mixtures often suffer from peak overlap. To overcome this issue, several studies employed the projections of the (tilted and symmetrized) 2D 1H J-resolved (JRES) spectra, p-JRES, which are similar to 1D 1H decoupled spectra. Nonetheless, there are no public databases available that allow searching for chemical shift spectral data for multiplets. We present the Chemical Shift Multiplet Database (CSMDB), built utilizing JRES spectra obtained from the Birmingham Metabolite Library. The CSMDB provides scoring accounting for both matched and unmatched peaks from a query list and the database hits. This input list is generated from a projection of a 2D statistical correlation analysis on the JRES spectra, p-(JRES-STOCSY), being able to compare the multiplets for the matched peaks, in essence, the f1 traces from the JRES-STOCSY spectrum and from the database hit. The inspection of the unmatched peaks for the database hit allows the retrieval of peaks in the query list that have a decreased correlation coefficient due to low intensities. The CSMDB is coupled to "ConQuer ABC", which permits the assessment of biological correlation by means of consecutive queries with the unmatched peaks in the first and subsequent queries.
Assuntos
Metabolômica , Correlação de Dados , Bases de Dados Factuais , Espectroscopia de Ressonância Magnética , Espectroscopia de Prótons por Ressonância MagnéticaRESUMO
The identification of metabolites in complex biological matrices is a challenging task in 1D 1H-NMR-based metabolomics studies. Statistical total correlation spectroscopy (STOCSY) has emerged for aiding the structural elucidation by revealing the peaks that present a high correlation to a driver peak of interest (which would likely belong to the same molecule). However, in these studies, the signals from metabolites are normally present as a mixture of overlapping resonances, limiting the performance of STOCSY. As an alternative to avoid the overlap issue, 2D 1H homonuclear J-resolved (JRES) spectra were projected, in their usual tilted and symmetrized processed form, and STOCSY was applied on these 1D projections (p-JRES-STOCSY). Nonetheless, this approach suffers in cases where the signals are very close. In addition, STOCSY was applied to the whole JRES spectra (also tilted) to identify correlated multiplets, although the overlap issue in itself was not addressed directly and the subsequent search in databases is complicated in cases of higher order coupling. With these limitations in mind, in the present work, we propose a new methodology based on the application of STOCSY on a set of nontilted JRES spectra, detecting peaks that would overlap in 1D spectra of the same sample set. Correlation comparison analysis for peak overlap detection (COCOA-POD) is able to reconstruct projected 1D STOCSY traces that result in more suitable database queries, as all peaks are summed at their f2 resonances instead of the resonance corresponding to the multiplet center in the tilted JRES spectra. (The peak dispersion and resolution enhancement gained are not sacrificed by the projection.) Besides improving database queries with better peak lists obtained from the projections of the 2D STOCSY analysis, the overlap region is examined, and the multiplet itself is analyzed from the correlation trace at 45° to obtain a cleaner multiplet profile, free from contributions from uncorrelated neighboring peaks.