Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 60
Filter
1.
Anal Chim Acta ; 1304: 342444, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38637030

ABSTRACT

A common goal in chemistry is to study the relationship between a measured signal and the variability of certain factors. To this end, researchers often use Design of Experiment to decide which experiments to conduct and (Multiple) Linear Regression, and/or Analysis of Variance to analyze the collected data. Among the assumptions to the very foundation of this strategy, all the experiments are independent, conditional on the settings of the factors. Unfortunately, due to the presence of uncontrollable factors, real-life experiments often deviate from this assumption, making the data analysis results unreliable. In these cases, Mixed-Effects modeling, despite not being widely used in chemometrics, represents a solid data analysis framework to obtain reliable results. Here we provide a tutorial for Linear Mixed-Effects models. We gently introduce the reader to these models by showing some motivating examples. Then, we discuss the theory behind Linear Mixed-Effect models, and we show how to fit these models by making use of real-life data obtained from an exposome study. Throughout the paper we provide R code so that each researcher is able to implement these useful model themselves.

2.
Environ Int ; 170: 107587, 2022 12.
Article in English | MEDLINE | ID: mdl-36274492

ABSTRACT

River water is an important source of Dutch drinking water. For this reason, continuous monitoring of river water quality is needed. However, comprehensive chemical analyses with high-resolution gas chromatography [GC]-mass spectrometry [MS]/liquid chromatography [LC]-MS are quite tedious and time consuming; this makes them poorly fit for routine water quality monitoring and, therefore, many pollution events are missed. Phytoplankton are highly sensitive and responsive to toxicity, which makes them highly usable for effect-based water quality monitoring. Flow cytometry can measure the optical properties of phytoplankton every hour, generating a large amount of information-rich data in one year. However, this requires chemometrics, as the resulting fingerprints need to be processed into information about abnormal phytoplankton behaviour. We developed Discriminant Analysis of Multi-Aspect CYtometry (DAMACY) to model the "normal condition" of the phytoplankton community imposed by diurnal, meteorological, and other exogenous influences. DAMACY first describes the cellular variability and distribution of phytoplankton in each measurement using principal component analysis, and then aims to find subtle differences in these phytoplankton distributions that predict normal environmental conditions. Deviations from these normal environmental conditions indicated abnormal phytoplankton behaviour that happened alongside pollution events measured with the GC/MS and LC/MS systems. Thus, our results demonstrate that flow cytometry in combination with chemometrics may be used for an automated hourly assessment of river water quality and as a near real-time early warning for detecting harmful known or unknown contaminants. Finally, both the flow cytometer and the DAMACY algorithm run completely autonomous and only requires maintenance once or twice per year. The warning system results may be uploaded automatically, so that drinking water companies may temporary stop pumping water whenever abnormal phytoplankton behaviour is detected. In the case of prolonged abnormal phytoplankton behaviour, comprehensive analysis may still be used to identify the chemical compound, its origin, and toxicity.


Subject(s)
Drinking Water , Phytoplankton , Water Quality , Flow Cytometry , Chemometrics
3.
Food Res Int ; 161: 111836, 2022 11.
Article in English | MEDLINE | ID: mdl-36192968

ABSTRACT

The development of portable NIR instruments facilitates widespread use among non-specialists. However, untrained operators may follow non-optimal measurement procedures. This work investigates how different factors in the measurement procedure influence the spectra of pig feed samples produced by SCiO, a handheld NIR. Measurement conditions were studied by means of Design of Experiments and evaluated with analysis of variance - simultaneous component analysis (ANOVA-SCA or ASCA). We quantified and visualized how measurement distance, angle, background lighting, the use of plastic lids and different devices interactively affect the resulting spectra. The samples could be distinguished with 100% accuracy with Partial Least Squares-Discriminant Analysis (PLS-DA) a scanning distance of 0.5 cm. Replication of the experiment with special attention to reproducing the conditions still lead to some differences, which highlights both the challenges in controlling conditions and the importance of considering them. Based on the results, generalizable guidelines for acceptance of spectra were proposed for this case study. Of main importance are performing measurements at distances of 0.5 cm or at least in an environment without background lighting. Overall, the provided guidelines for measurement conditions and a methodology to investigate this for other devices are a key enabler to spreading handheld spectrometry to a non-expert audience.


Subject(s)
Plastics , Spectroscopy, Near-Infrared , Animals , Discriminant Analysis , Least-Squares Analysis , Spectrophotometry , Spectroscopy, Near-Infrared/methods , Swine
4.
Sci Rep ; 12(1): 15687, 2022 09 20.
Article in English | MEDLINE | ID: mdl-36127378

ABSTRACT

For the extraction of spatially important regions from mass spectrometry imaging (MSI) data, different clustering methods have been proposed. These clustering methods are based on certain assumptions and use different criteria to assign pixels into different classes. For high-dimensional MSI data, the curse of dimensionality also limits the performance of clustering methods which are usually overcome by pre-processing the data using dimension reduction techniques. In summary, the extraction of spatial patterns from MSI data can be done using different unsupervised methods, but the robust evaluation of clustering results is what is still missing. In this study, we have performed multiple simulations on synthetic and real MSI data to validate the performance of unsupervised methods. The synthetic data were simulated mimicking important spatial and statistical properties of real MSI data. Our simulation results confirmed that K-means clustering with correlation distance and Gaussian Mixture Modeling clustering methods give optimal performance in most of the scenarios. The clustering methods give efficient results together with dimension reduction techniques. From all the dimension techniques considered here, the best results were obtained with the minimum noise fraction (MNF) transform. The results were confirmed on both synthetic and real MSI data. However, for successful implementation of MNF transform the MSI data requires to be of limited dimensions.


Subject(s)
Diagnostic Imaging , Cluster Analysis , Mass Spectrometry/methods , Normal Distribution
5.
PLoS One ; 17(8): e0268881, 2022.
Article in English | MEDLINE | ID: mdl-36001537

ABSTRACT

PURPOSE: To evaluate the value of convolutional neural network (CNN) in the diagnosis of human brain tumor or Alzheimer's disease by MR spectroscopic imaging (MRSI) and to compare its Matthews correlation coefficient (MCC) score against that of other machine learning methods and previous evaluation of the same data. We address two challenges: 1) limited number of cases in MRSI datasets and 2) interpretability of results in the form of relevant spectral regions. METHODS: A shallow CNN with only one hidden layer and an ad-hoc loss function was constructed involving two branches for processing spectral and image features of a brain voxel respectively. Each branch consists of a single convolutional hidden layer. The output of the two convolutional layers is merged and fed to a classification layer that outputs class predictions for the given brain voxel. RESULTS: Our CNN method separated glioma grades 3 and 4 and identified Alzheimer's disease patients using MRSI and complementary MRI data with high MCC score (Area Under the Curve were 0.87 and 0.91 respectively). The results demonstrated superior effectiveness over other popular methods as Partial Least Squares or Support Vector Machines. Also, our method automatically identified the spectral regions most important in the diagnosis process and we show that these are in good agreement with existing biomarkers from the literature. CONCLUSION: Shallow CNNs models integrating image and spectral features improved quantitative and exploration and diagnosis of brain diseases for research and clinical purposes. Software is available at https://bitbucket.org/TeslaH2O/cnn_mrsi.


Subject(s)
Alzheimer Disease , Brain Neoplasms , Alzheimer Disease/diagnostic imaging , Brain Neoplasms/diagnostic imaging , Humans , Machine Learning , Magnetic Resonance Imaging/methods , Neural Networks, Computer
6.
Anal Chim Acta ; 1203: 339707, 2022 Apr 22.
Article in English | MEDLINE | ID: mdl-35361420

ABSTRACT

Many industries see a shifting focus towards performing on-site analysis using handheld spectroscopic devices. A determining factor for decision-making on the commissioning of these devices is available information on the potential performance of the device for specific applications. By now, myriad handheld solutions with very different specifications and pricing are available on the market. Although specifications are generally available for new devices, this does not directly quantify or predict how available devices will perform for targeted cases. We present a novel chemometric method to estimate the prediction performance of handheld NIR hardware and apply it to estimate the performance of two commercially available handheld NIR technologies in predicting protein content (ranging 120-180 g kg-1) in pig feed from existing data of a benchtop device. Adjusting benchtop data to the wavelength range and resolution of the handheld device lead to over-optimistic estimates of the handheld performances. Our method additionally utilizes information on the error structure of the handheld devices for the estimation. It yielded performance estimates differing less than 1 g kg-1 from the experimentally determined handheld performances and similar model parameters. Our method was effective for linear and nonlinear calibration algorithms, also when estimating performance after averaging multiple scans. Replicate spectra of twenty samples recorded using the handheld were required for replication error estimation to obtain an accurate performance estimation. The error structure could be reported by manufacturers in the future for this approach to be universally employed for predictive quantitative technology assessment. Overall, our method provides estimates of the performance of a handheld device for a specific task with minimal testing required and can thus be used as a device or application screening tool before committing to develop calibrations.


Subject(s)
Photons , Spectroscopy, Near-Infrared , Algorithms , Animals , Calibration , Spectroscopy, Near-Infrared/methods , Swine
7.
Cytometry A ; 101(1): 72-85, 2022 01.
Article in English | MEDLINE | ID: mdl-34327803

ABSTRACT

The rapid evolution of the flow cytometry field, currently allowing the measurement of 30-50 parameters per cell, has led to a marked increase in deep multivariate information. Manual gating is insufficient to extract all this information. Therefore, multivariate analysis (MVA) methods have been developed to extract information and efficiently analyze the high-density multicolour flow cytometry (MFC) data. To aid interpretation, MFC data are often logarithmically transformed before MVA. We studied the consequences of different transformations of flow cytometry data in datasets containing negative intensities caused by background subtractions and spreading error, as logarithmic transformation of negative data is impossible. Transformations such as logicle or hyperbolic arcsine transformations allow linearity around zero, whereas higher (positive and negative) intensities are logarithmically transformed. To define the linear range, a parameter (or cofactor) must be chosen. We show how the chosen transformation parameter has great impact on the MVA results. In some cases, peak splitting is observed, producing two distributions around zero in an actual homogeneous population. This may be misinterpreted as the presence of multiple cell populations. Moreover, when performing arbitrary transformation before MVA analysis, biologically relevant and statistically significant information might be missed. We present a new algorithm, Optimal Transformation for flow cytometry data (OTflow), which uses various statistical methods to optimally choose the parameter of the transformation and prevent artifacts such as peak splitting. Arbitrary or unconsidered transformation can lead to wrong conclusions for the MVA cluster methods, dimensionality reduction methods, and classification methods. We recommend transformation of flow cytometry data by using OTflow-defined parameters estimated per channel, in order to prevent peak splitting and other artifacts in the data.


Subject(s)
Algorithms , Artifacts , Flow Cytometry , Multivariate Analysis
8.
Anal Chim Acta ; 1180: 338890, 2021 Oct 02.
Article in English | MEDLINE | ID: mdl-34538330

ABSTRACT

The long-term prediction performance of spectroscopic calibration models is a critical factor to monitor or control many production processes. Over time, new variations may emerge that deteriorate prediction performance. Therefore, models have to be maintained to retain or improve their prediction performance through time, requiring considerable resources and data. Maintenance should improve relevant predictions but also needs to be resource and cost efficient. Current approaches do not consider these trade-offs. We propose a new method to quantify the effectiveness and cost of model maintenance strategies based on historical data. Model performance over time for past, imminent and future samples is evaluated as these may react differently to maintenance. The model performance and required updating resources are translated into relative cost and benefit to compare strategies and determine optimal maintenance parameters. We used this method to evaluate a maintenance strategy that combines adding incoming samples to the calibration data with re-optimization of spectral preprocessing and modelling parameters. Continuously adding samples to the calibration data is shown to improve prediction performance and leads to more robust and generic models for emerging variations in all investigated data streams. Selectively adding incoming sample variations showed a reduced prediction performance but saves considerably in resources. Comparing model performance on the different sampling windows can also be used to determine an optimal updating frequency. This novel strategy to evaluate the expected performance and determine an optimal maintenance strategy is generally applicable and should lead to robust and consistently high prospective and/or retrospective model performance through time, which can be crucial for optimal operation and fault detection in industrial processes.


Subject(s)
Calibration , Cost-Benefit Analysis , Prospective Studies , Retrospective Studies
9.
Gigascience ; 9(11)2020 11 25.
Article in English | MEDLINE | ID: mdl-33241286

ABSTRACT

BACKGROUND: Drug mass spectrometry imaging (MSI) data contain knowledge about drug and several other molecular ions present in a biological sample. However, a proper approach to fully explore the potential of such type of data is still missing. Therefore, a computational pipeline that combines different spatial and non-spatial methods is proposed to link the observed drug distribution profile with tumor heterogeneity in solid tumor. Our data analysis steps include pre-processing of MSI data, cluster analysis, drug local indicators of spatial association (LISA) map, and ions selection. RESULTS: The number of clusters identified from different tumor tissues. The spatial homogeneity of the individual cluster was measured using a modified version of our drug homogeneity method. The clustered image and drug LISA map were simultaneously analyzed to link identified clusters with observed drug distribution profile. Finally, ions selection was performed using the spatially aware method. CONCLUSIONS: In this paper, we have shown an approach to correlate the drug distribution with spatial heterogeneity in untargeted MSI data. Our approach is freely available in an R package 'CorrDrugTumorMSI'.


Subject(s)
Neoplasms , Pharmaceutical Preparations , Diagnostic Imaging , Humans , Mass Spectrometry , Neoplasms/diagnostic imaging , Neoplasms/drug therapy , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
10.
Sci Rep ; 10(1): 9716, 2020 06 16.
Article in English | MEDLINE | ID: mdl-32546713

ABSTRACT

Flow Cytometry is an analytical technology to simultaneously measure multiple markers per single cell. Ten thousands to millions of single cells can be measured per sample and each sample may contain a different number of cells. All samples may be bundled together, leading to a 'multi-set' structure. Many multivariate methods have been developed for Flow Cytometry data but none of them considers this structure in their quantitative handling of the data. The standard pre-processing used by existing multivariate methods provides models mainly influenced by the samples with more cells, while such a model should provide a balanced view of the biomedical information within all measurements. We propose an alternative 'multi-set' preprocessing that corrects for the difference in number of cells measured, balancing the relative importance of each multi-cell sample in the data while using all data collected from these expensive analyses. Moreover, one case example shows how multi-set pre-processing may benefit removal of undesired measurement-to-measurement variability and another where class-based multi-set pre-processing enhances the studied response upon comparison to the control reference samples. Our results show that adjusting data analysis algorithms to consider this multi-set structure may greatly benefit immunological insight and classification performance of Flow Cytometry data.


Subject(s)
Electronic Data Processing/methods , Flow Cytometry/methods , Multivariate Analysis , Algorithms , Biomarkers , Data Analysis , Humans , Mathematical Computing , Research Design
11.
Data Brief ; 29: 105357, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32195297

ABSTRACT

Diffuse reflectance near-infrared (NIR) data (908-1676 nm) of chicken breast fillets was recorded in a non-destructive way using a portable miniaturised NIR spectrometer. The NIR data was used to discriminate between fresh and thawed breast fillets and to determine the birds' growth conditions. NIR data was recorded of 153 commercial supermarket chicken fillet samples by applying the NIR device equipped with the standard issue collar on the samples in three different ways: (i) directly on the meat (ii) through the top foil of the package (i.e. with an air pocket between the foil and the breast fillet), and (iii) through the top foil with the packaging turned bottom up (i.e. no air pocket between the foil and the breast fillet). In order to generate thawed samples, the fresh samples were frozen and subsequently thawed. The freshness of the fillets was checked using ß-hydroxyacyl-CoA-dehydrogenase of 13% of the sample set. Five NIR spectra were collected per measurement mode from each sample resulting in 4590 raw NIR spectra. Multivariate statistics was applied and the interpretation of these calculations can be found in Parastar et al. [1]. The NIR data has a reuse potential for follow-up studies of chicken breast fillet authentication using a similar brand NIR device or to serve as calibration transfer data.

12.
Anal Chim Acta ; 1093: 1-15, 2020 Jan 06.
Article in English | MEDLINE | ID: mdl-31735202

ABSTRACT

Combining the individual analytical strengths of mass spectrometry and infrared spectroscopy, infrared ion spectroscopy is increasingly recognized as a powerful tool for small-molecule identification in a wide range of analytical applications. Mass spectrometry is itself a leading analytical technique for small-molecule identification on the merit of its outstanding sensitivity, selectivity and versatility. The foremost shortcoming of the technique, however, is its limited ability to directly probe molecular structure, especially when contrasted against spectroscopic techniques. In infrared ion spectroscopy, infrared vibrational spectra are recorded for mass-isolated ions and provide a signature that can be matched to reference spectra, either measured from standards or predicted using quantum-chemical calculations. Here we present an overview of the potential for this technique to develop into a versatile analytical method for identifying molecular structures in mass spectrometry-based analytical workflows. In this tutorial perspective, we introduce the reader to the technique of infrared ion spectroscopy and highlight a selection of recent experimental advances and applications in current analytical challenges, in particular in the field of untargeted metabolomics. We report on the coupling of infrared ion spectroscopy with liquid chromatography and present experiments that serve as proof-of-principle examples of strategies to address outstanding challenges.

13.
Sci Rep ; 9(1): 6777, 2019 05 01.
Article in English | MEDLINE | ID: mdl-31043667

ABSTRACT

Multicolour flow cytometry (MFC) is used to measure multiple cellular markers at the single-cell level. Cellular markers may be coloured with different panels of fluorescently-labelled antibodies to enable cell identification or the detection of activated cells in pre-defined, 'gated' specific cell subsets. The number of markers that can be used per measurement is technologically limited however, requiring every panel to be analysed in a separate aliquot measurement. The combined analyses of these dedicated panels may enhance the predictive ability of these measurements and could enrich the interpretation of the immunological information. Here we introduce a fusion method for MFC data, based on DAMACY (Discriminant Analysis of Multi-Aspect Cytometry data), which can combine information from complementary panels. This approach leads to both enhanced predictions and clearer interpretations in comparison with the analysis of separate measurements. We illustrate this method using two datasets: the response of neutrophils evoked by a systemic endotoxin challenge and the activated immune status of the innate cells, T cells and B cells in obese versus lean individuals. The data fusion approach was able to detect cells that do not individually show a difference between clinical phenotypes but do play a role in combination with other cells.


Subject(s)
Biomarkers/analysis , Flow Cytometry/methods , Immunophenotyping/methods , Obesity/physiopathology , Thinness/physiopathology , Antibodies, Monoclonal/immunology , Discriminant Analysis , Humans , Phenotype
14.
Sci Rep ; 8(1): 10907, 2018 Jul 19.
Article in English | MEDLINE | ID: mdl-30026601

ABSTRACT

Multicolor Flow Cytometry (MFC)-based gating allows the selection of cellular (pheno)types based on their unique marker expression. Current manual gating practice is highly subjective and may remove relevant information to preclude discovery of cell populations with specific co-expression of multiple markers. Only multivariate approaches can extract such aspects of cell variability from multi-dimensional MFC data. We describe the novel method ECLIPSE (Elimination of Cells Lying in Patterns Similar to Endogeneity) to identify and characterize aberrant cells present in individuals out of homeostasis. ECLIPSE combines dimensionality reduction by Simultaneous Component Analysis with Kernel Density Estimates. A Difference between Densities (DbD) is used to eliminate cells in responder samples that overlap in marker expression with cells of controls. Thereby, subsequent data analyses focus on the immune response-specific cells, leading to more informative and focused models. To prove the power of ECLIPSE, we applied the method to study two distinct datasets: the in vivo neutrophil response induced by systemic endotoxin challenge and in studying the heterogeneous immune-response of asthmatics. ECLIPSE described the well-characterized common response in the LPS challenge insightfully, while identifying slight differences between responders. Also, ECLIPSE enabled characterization of the immune response associated to asthma, where the co-expressions between all markers were used to stratify patients according to disease-specific cell profiles.


Subject(s)
Asthma/immunology , Computational Biology/methods , Endotoxins/adverse effects , Flow Cytometry/methods , Lymphocytes/cytology , Adult , Aged , Algorithms , Biomarkers/metabolism , Case-Control Studies , Endotoxins/immunology , Female , Humans , Lymphocytes/metabolism , Male , Middle Aged , Young Adult
15.
Anal Chim Acta ; 982: 37-47, 2017 Aug 22.
Article in English | MEDLINE | ID: mdl-28734364

ABSTRACT

The calibration performance of Partial Least Squares regression (PLS) can be improved by eliminating uninformative variables. For PLS, many variable elimination methods have been developed. One is the Uninformative-Variable Elimination for PLS (UVE-PLS). However, the number of variables retained by UVE-PLS is usually still large. In UVE-PLS, variable elimination is repeated as long as the root mean squared error of cross validation (RMSECV) is decreasing. The set of variables in this first local minimum is retained. In this paper, a modification of UVE-PLS is proposed and investigated, in which UVE is repeated until no further reduction in variables is possible, followed by a search for the global RMSECV minimum. The method is called Global-Minimum Error Uninformative-Variable Elimination for PLS, denoted as GME-UVE-PLS or simply GME-UVE. After each iteration, the predictive ability of the PLS model, built with the remaining variable set, is assessed by RMSECV. The variable set with the global RMSECV minimum is then finally selected. The goal is to obtain smaller sets of variables with similar or improved predictability than those from the classical UVE-PLS method. The performance of the GME-UVE-PLS method is investigated using four data sets, i.e. a simulated set, NIR and NMR spectra, and a theoretical molecular descriptors set, resulting in twelve profile-response (X-y) calibrations. The selective and predictive performances of the models resulting from GME-UVE-PLS are statistically compared to those from UVE-PLS and 1-step UVE, one-sided paired t-tests. The results demonstrate that variable reduction with the proposed GME-UVE-PLS method, usually eliminates significantly more variables than the classical UVE-PLS, while the predictive abilities of the resulting models are better. With GME-UVE-PLS, a lower number of uninformative variables, without a chemical meaning for the response, may be retained than with UVE-PLS. The selectivity of the classical UVE method thus can be improved by the application of the proposed GME-UVE method resulting in more parsimonious models.

16.
Sci Rep ; 7(1): 5471, 2017 07 14.
Article in English | MEDLINE | ID: mdl-28710472

ABSTRACT

Multicolour Flow Cytometry (MFC) produces multidimensional analytical data on the quantitative expression of multiple markers on single cells. This data contains invaluable biomedical information on (1) the marker expressions per cell, (2) the variation in such expression across cells, (3) the variability of cell marker expression across samples that (4) may vary systematically between cells collected from donors and patients. Current conventional and even advanced data analysis methods for MFC data explore only a subset of these levels. The Discriminant Analysis of MultiAspect CYtometry (DAMACY) we present here provides a comprehensive view on health and disease responses by integrating all four levels. We validate DAMACY by using three distinct datasets: in vivo response of neutrophils evoked by systemic endotoxin challenge, the clonal response of leukocytes in bone marrow of acute myeloid leukaemia (AML) patients, and the complex immune response in blood of asthmatics. DAMACY provided good accuracy 91-100% in the discrimination between health and disease, on par with literature values. Additionally, the method provides figures that give insight into the marker expression and cell variability for more in-depth interpretation, that can benefit both physicians and biomedical researchers to better diagnose and monitor diseases that are reflected by changes in blood leukocytes.


Subject(s)
Biomarkers/analysis , Data Analysis , Flow Cytometry/methods , Single-Cell Analysis , Adult , Aged , Asthma/pathology , Color , Discriminant Analysis , Humans , Leukemia, Myeloid, Acute/pathology , Lipopolysaccharides/pharmacology , Middle Aged , Models, Biological , Phenotype , Young Adult
17.
Anal Chim Acta ; 963: 1-16, 2017 04 22.
Article in English | MEDLINE | ID: mdl-28335962

ABSTRACT

Revealing the biochemistry associated to micro-organismal interspecies interactions is highly relevant for many purposes. Each pathogen has a characteristic metabolic fingerprint that allows identification based on their unique multivariate biochemistry. When pathogen species come into mutual contact, their co-culture will display a chemistry that may be attributed both to mixing of the characteristic chemistries of the mono-cultures and to competition between the pathogens. Therefore, investigating pathogen development in a polymicrobial environment requires dedicated chemometric methods to untangle and focus upon these sources of variation. The multivariate data analysis method Projected Orthogonalised Chemical Encounter Monitoring (POCHEMON) is dedicated to highlight metabolites characteristic for the interaction of two micro-organisms in co-culture. However, this approach is currently limited to a single time-point, while development of polymicrobial interactions may be highly dynamic. A well-known multivariate implementation of Analysis of Variance (ANOVA) uses Principal Component Analysis (ANOVA-PCA). This allows the overall dynamics to be separated from the pathogen-specific chemistry to analyse the contributions of both aspects separately. For this reason, we propose to integrate ANOVA-PCA with the POCHEMON approach to disentangle the pathogen dynamics and the specific biochemistry in interspecies interactions. Two complementary case studies show great potential for both liquid and gas chromatography - mass spectrometry to reveal novel information on chemistry specific to interspecies interaction during pathogen development.


Subject(s)
Chemistry Techniques, Analytical/methods , Microbiology , Principal Component Analysis , Analysis of Variance , Chromatography, Liquid , Coculture Techniques , Gas Chromatography-Mass Spectrometry
18.
Anal Chim Acta ; 954: 22-31, 2017 Feb 15.
Article in English | MEDLINE | ID: mdl-28081811

ABSTRACT

In this work we show that convolutional neural networks (CNNs) can be efficiently used to classify vibrational spectroscopic data and identify important spectral regions. CNNs are the current state-of-the-art in image classification and speech recognition and can learn interpretable representations of the data. These characteristics make CNNs a good candidate for reducing the need for preprocessing and for highlighting important spectral regions, both of which are crucial steps in the analysis of vibrational spectroscopic data. Chemometric analysis of vibrational spectroscopic data often relies on preprocessing methods involving baseline correction, scatter correction and noise removal, which are applied to the spectra prior to model building. Preprocessing is a critical step because even in simple problems using 'reasonable' preprocessing methods may decrease the performance of the final model. We develop a new CNN based method and provide an accompanying publicly available software. It is based on a simple CNN architecture with a single convolutional layer (a so-called shallow CNN). Our method outperforms standard classification algorithms used in chemometrics (e.g. PLS) in terms of accuracy when applied to non-preprocessed test data (86% average accuracy compared to the 62% achieved by PLS), and it achieves better performance even on preprocessed test data (96% average accuracy compared to the 89% achieved by PLS). For interpretability purposes, our method includes a procedure for finding important spectral regions, thereby facilitating qualitative interpretation of results.

19.
Anal Chim Acta ; 938: 44-52, 2016 Sep 28.
Article in English | MEDLINE | ID: mdl-27619085

ABSTRACT

The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of the user. In this work, the approach is illustrated by using PLS as model and PPRV-FCAM (Predictive Property Ranked Variable using Final Complexity Adapted Models) for variable selection.

20.
Analyst ; 141(20): 5689-5708, 2016 Oct 21.
Article in English | MEDLINE | ID: mdl-27549384

ABSTRACT

Historically, advances in the field of ion mobility spectrometry have been hindered by the variation in measured signals between instruments developed by different research laboratories or manufacturers. This has triggered the development and application of chemometric techniques able to reveal and analyze precious information content of ion mobility spectra. Recent advances in multidimensional coupling of ion mobility spectrometry to chromatography and mass spectrometry has created new, unique challenges for data processing, yielding high-dimensional, megavariate datasets. In this paper, a complete overview of available chemometric techniques used in the analysis of ion mobility spectrometry data is given. We describe the current state-of-the-art of ion mobility spectrometry data analysis comprising datasets with different complexities and two different scopes of data analysis, i.e. targeted and non-targeted analyte analyses. Two main steps of data analysis are considered: data preprocessing and pattern recognition. A detailed description of recent advances in chemometric techniques is provided for these steps, together with a list of interesting applications. We demonstrate that chemometric techniques have a significant contribution to the recent and great expansion of ion mobility spectrometry technology into different application fields. We conclude that well-thought out, comprehensive data analysis strategies are currently emerging, including several chemometric techniques and addressing different data challenges. In our opinion, this trend will continue in the near future, stimulating developments in ion mobility spectrometry instrumentation even further.

SELECTION OF CITATIONS
SEARCH DETAIL
...