Search | VHL Regional Portal

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics.

Christin, Christin; Hoefsloot, Huub C J; Smilde, Age K; Hoekman, B; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter.

Mol Cell Proteomics ; 12(1): 263-76, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23115301

ABSTRACT

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.

Subject(s)

Metabolomics/methods , Peptides/cerebrospinal fluid , Peptides/urine , Proteomics/methods , Animals , Biomarkers/analysis , Chromatography, Liquid , Computational Biology , Data Interpretation, Statistical , Gene Expression Profiling , Humans , Mass Spectrometry , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated , Principal Component Analysis , Swine

Mutations in the phospholipid remodeling gene SERAC1 impair mitochondrial function and intracellular cholesterol trafficking and cause dystonia and deafness.

Wortmann, Saskia B; Vaz, Frédéric M; Gardeitchik, Thatjana; Vissers, Lisenka E L M; Renkema, G Herma; Schuurs-Hoeijmakers, Janneke H M; Kulik, Wim; Lammens, Martin; Christin, Christin; Kluijtmans, Leo A J; Rodenburg, Richard J; Nijtmans, Leo G J; Grünewald, Anne; Klein, Christine; Gerhold, Joachim M; Kozicz, Tamas; van Hasselt, Peter M; Harakalova, Magdalena; Kloosterman, Wigard; Baric, Ivo; Pronicka, Ewa; Ucar, Sema Kalkan; Naess, Karin; Singhal, Kapil K; Krumina, Zita; Gilissen, Christian; van Bokhoven, Hans; Veltman, Joris A; Smeitink, Jan A M; Lefeber, Dirk J; Spelbrink, Johannes N; Wevers, Ron A; Morava, Eva; de Brouwer, Arjan P M.

Nat Genet ; 44(7): 797-802, 2012 Jun 10.

Article in English | MEDLINE | ID: mdl-22683713

ABSTRACT

Using exome sequencing, we identify SERAC1 mutations as the cause of MEGDEL syndrome, a recessive disorder of dystonia and deafness with Leigh-like syndrome, impaired oxidative phosphorylation and 3-methylglutaconic aciduria. We localized SERAC1 at the interface between the mitochondria and the endoplasmic reticulum in the mitochondria-associated membrane fraction that is essential for phospholipid exchange. A phospholipid analysis in patient fibroblasts showed elevated concentrations of phosphatidylglycerol-34:1 (where the species nomenclature denotes the number of carbon atoms in the two acyl chains:number of double bonds in the two acyl groups) and decreased concentrations of phosphatidylglycerol-36:1 species, resulting in an altered cardiolipin subspecies composition. We also detected low concentrations of bis(monoacyl-glycerol)-phosphate, leading to the accumulation of free cholesterol, as shown by abnormal filipin staining. Complementation of patient fibroblasts with wild-type human SERAC1 by lentiviral infection led to a decrease and partial normalization of the mean ratio of phosphatidylglycerol-34:1 to phosphatidylglycerol-36:1. Our data identify SERAC1 as a key player in the phosphatidylglycerol remodeling that is essential for both mitochondrial function and intracellular cholesterol trafficking.

Subject(s)

Carboxylic Ester Hydrolases/genetics , Cholesterol/metabolism , Deafness/genetics , Dystonia/genetics , Mitochondria/genetics , Mutation , Phospholipids/metabolism , Amino Acid Sequence , Carboxylic Ester Hydrolases/metabolism , Cardiolipins/genetics , Cardiolipins/metabolism , Cell Line, Transformed , Cell Line, Tumor , Cholesterol/genetics , Deafness/metabolism , Dystonia/metabolism , Exome , Fibroblasts/metabolism , HEK293 Cells , HeLa Cells , Humans , Mitochondria/metabolism , Molecular Sequence Data , Oxidative Phosphorylation , Phosphatidylglycerols/genetics , Phosphatidylglycerols/metabolism , Phospholipids/genetics , Sequence Alignment

Profiling and identification of cerebrospinal fluid proteins in a rat EAE model of multiple sclerosis.

Rosenling, Therese; Stoop, Marcel P; Attali, Amos; van Aken, Hans; Suidgeest, Ernst; Christin, Christin; Stingl, Christoph; Suits, Frank; Horvatovich, Peter; Hintzen, Rogier Q; Tuinstra, Tinka; Bischoff, Rainer; Luider, Theo M.

J Proteome Res ; 11(4): 2048-60, 2012 Apr 06.

Article in English | MEDLINE | ID: mdl-22320401

ABSTRACT

The experimental autoimmune encephalomyelitis (EAE) model resembles certain aspects of multiple sclerosis (MScl), with common features such as motor dysfunction, axonal degradation, and infiltration of T-cells. We studied the cerebrospinal fluid (CSF) proteome in the EAE rat model to identify proteomic changes relevant for MScl disease pathology. EAE was induced in male Lewis rats by injection of myelin basic protein (MBP) together with complete Freund's adjuvant (CFA). An inflammatory control group was injected with CFA alone, and a nontreated group served as healthy control. CSF was collected at day 10 and 14 after immunization and analyzed by bottom-up proteomics on Orbitrap LC-MS and QTOF LC-MS platforms in two independent laboratories. By combining results, 44 proteins were discovered to be significantly increased in EAE animals compared to both control groups, 25 of which have not been mentioned in relation to the EAE model before. Lysozyme C1, fetuin B, T-kininogen, serum paraoxonase/arylesterase 1, glutathione peroxidase 3, complement C3, and afamin are among the proteins significantly elevated in this rat EAE model. Two proteins, afamin and complement C3, were validated in an independent sample set using quantitative selected reaction monitoring mass spectrometry. The molecular weights of the identified differentially abundant proteins indicated an increased transport across the blood-brain barrier (BBB) at the peak of the disease, caused by an increase in BBB permeability.

Subject(s)

Cerebrospinal Fluid Proteins/analysis , Disease Models, Animal , Encephalomyelitis, Autoimmune, Experimental/cerebrospinal fluid , Multiple Sclerosis/cerebrospinal fluid , Proteome/analysis , Proteomics/methods , Animals , Body Weight , Cerebrospinal Fluid Proteins/chemistry , Chromatography, Liquid , Male , Mass Spectrometry , Paralysis/cerebrospinal fluid , Rats , Rats, Inbred Lew

The impact of delayed storage on the measured proteome and metabolome of human cerebrospinal fluid.

Rosenling, Therese; Stoop, Marcel P; Smolinska, Agnieszka; Muilwijk, Bas; Coulier, Leon; Shi, Shanna; Dane, Adrie; Christin, Christin; Suits, Frank; Horvatovich, Peter L; Wijmenga, Sybren S; Buydens, Lutgarde M C; Vreeken, Rob; Hankemeier, Thomas; van Gool, Alain J; Luider, Theo M; Bischoff, Rainer.

Clin Chem ; 57(12): 1703-11, 2011 Dec.

Article in English | MEDLINE | ID: mdl-21998343

ABSTRACT

BACKGROUND: Because cerebrospinal fluid (CSF) is in close contact with diseased areas in neurological disorders, it is an important source of material in the search for molecular biomarkers. However, sample handling for CSF collected from patients in a clinical setting might not always be adequate for use in proteomics and metabolomics studies. METHODS: We left CSF for 0, 30, and 120 min at room temperature immediately after sample collection and centrifugation/removal of cells. At 2 laboratories CSF proteomes were subjected to tryptic digestion and analyzed by use of nano-liquid chromatography (LC) Orbitrap mass spectrometry (MS) and chipLC quadrupole TOF-MS. Metabolome analysis was performed at 3 laboratories by NMR, GC-MS, and LC-MS. Targeted analyses of cystatin C and albumin were performed by LC-tandem MS in the selected reaction monitoring mode. RESULTS: We did not find significant changes in the measured proteome and metabolome of CSF stored at room temperature after centrifugation, except for 2 peptides and 1 metabolite, 2,3,4-trihydroxybutanoic (threonic) acid, of 5780 identified peptides and 93 identified metabolites. A sensitive protein stability marker, cystatin C, was not affected. CONCLUSIONS: The measured proteome and metabolome of centrifuged human CSF is stable at room temperature for up to 2 hours. We cannot exclude, however, that changes undetectable with our current methodology, such as denaturation or proteolysis, might occur because of sample handling conditions. The stability we observed gives laboratory personnel at the collection site sufficient time to aliquot samples before freezing and storage at -80 °C.

Subject(s)

Metabolome , Proteome/metabolism , Specimen Handling , Cerebrospinal Fluid , Chromatography, Gas , Chromatography, Liquid , Humans , Magnetic Resonance Spectroscopy , Mass Spectrometry/methods , Time Factors

Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC-MS for biomarker discovery.

Christin, Christin; Bischoff, Rainer; Horvatovich, Péter.

Talanta ; 83(4): 1209-24, 2011 Jan 30.

Article in English | MEDLINE | ID: mdl-21215856

ABSTRACT

Label-free quantitative LC-MS profiling of complex body fluids has become an important analytical tool for biomarker and biological knowledge discovery in the past decade. Accurate processing, statistical analysis and validation of acquired data diversified by the different types of mass spectrometers, mass spectrometer parameter settings and applied sample preparation steps are essential to answer complex life science research questions and understand the molecular mechanism of disease onset and developments. This review provides insight into the main modules of label-free data processing pipelines with statistical analysis and validation and discusses recent developments. Special emphasis is devoted to quality control methods, performance assessment of complete workflows and algorithms of individual modules. Finally, the review discusses the current state and trends in high throughput data processing and analysis solutions for users with little bioinformatics knowledge.

Subject(s)

Chromatography, Liquid/methods , Mass Spectrometry/methods , Proteomics/methods , Statistics as Topic/methods , Animals , Biomarkers/analysis , Humans , Reproducibility of Results

Time alignment algorithms based on selected mass traces for complex LC-MS data.

Christin, Christin; Hoefsloot, Huub C J; Smilde, Age K; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter L.

J Proteome Res ; 9(3): 1483-95, 2010 Mar 05.

Article in English | MEDLINE | ID: mdl-20070124

ABSTRACT

Time alignment of complex LC-MS data remains a challenge in proteomics and metabolomics studies. This work describes modifications of the Dynamic Time Warping (DTW) and the Parametric Time Warping (PTW) algorithms that improve the alignment quality for complex, highly variable LC-MS data sets. Regular DTW or PTW use one-dimensional profiles such as the Total Ion Chromatogram (TIC) or Base Peak Chromatogram (BPC) resulting in correct alignment if the signals have a relatively simple structure. However, when aligning the TICs of chromatograms from complex mixtures with large concentration variability such as serum or urine, both algorithms often lead to misalignment of peaks and thus incorrect comparisons in the subsequent statistical analysis. This is mainly due to the fact that compounds with different m/z values but similar retention times are not considered separately but confounded in the benefit function of the algorithms using only one-dimensional information. Thus, it is necessary to treat the information of different mass traces separately in the warping function to ensure that compounds having the same m/z value and retention time are aligned to each other. The Component Detection Algorithm (CODA) is widely used to calculate the quality of an LC-MS mass trace. By combining CODA with the warping algorithms of DTW or PTW (DTW-CODA or PTW-CODA), we include only high quality mass traces measured by CODA in the benefit function. Our results show that using several CODA selected high quality mass traces in DTW-CODA and PTW-CODA significantly improves the alignment quality of three different, highly complex LC-MS data sets. Moreover, DTW-CODA leads to better preservation of peak shape as compared to the original DTW-TIC algorithm, which often suffers from a substantial peak shape distortion. Our results show that combination of CODA selected mass traces with different time alignment algorithm is a general principle that provide accurate alignment for highly complex samples with large concentration variability.

Subject(s)

Algorithms , Chromatography, Liquid/methods , Computational Biology/methods , Mass Spectrometry/methods , Metabolomics/methods , Blood Proteins/analysis , Databases, Factual , Humans , Proteinuria/blood , Time Factors , Trypsin/chemistry , Trypsin/metabolism , Urine/chemistry

The effect of preanalytical factors on stability of the proteome and selected metabolites in cerebrospinal fluid (CSF).

Rosenling, Therese; Slim, Christiaan L; Christin, Christin; Coulier, Leon; Shi, Shanna; Stoop, Marcel P; Bosman, Jan; Suits, Frank; Horvatovich, Peter L; Stockhofe-Zurwieden, Norbert; Vreeken, Rob; Hankemeier, Thomas; van Gool, Alain J; Luider, Theo M; Bischoff, Rainer.

J Proteome Res ; 8(12): 5511-22, 2009 Dec.

Article in English | MEDLINE | ID: mdl-19845411

ABSTRACT

To standardize the use of cerebrospinal fluid (CSF) for biomarker research, a set of stability studies have been performed on porcine samples to investigate the influence of common sample handling procedures on proteins, peptides, metabolites and free amino acids. This study focuses at the effect on proteins and peptides, analyzed by applying label-free quantitation using microfluidics nanoscale liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (chipLC-MS) as well as matrix-assisted laser desorption ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FT-ICR-MS) and Orbitrap LC-MS/MS to trypsin-digested CSF samples. The factors assessed were a 30 or 120 min time delay at room temperature before storage at -80 degrees C after the collection of CSF in order to mimic potential delays in the clinic (delayed storage), storage at 4 degrees C after trypsin digestion to mimic the time that samples remain in the cooled autosampler of the analyzer, and repeated freeze-thaw cycles to mimic storage and handling procedures in the laboratory. The delayed storage factor was also analyzed by gas chromatography mass spectrometry (GC-MS) and liquid chromatography mass spectrometry (LC-MS) for changes of metabolites and free amino acids, respectively. Our results show that repeated freeze/thawing introduced changes in transthyretin peptide levels. The trypsin digested samples left at 4 degrees C in the autosampler showed a time-dependent decrease of peak areas for peptides from prostaglandin D-synthase and serotransferrin. Delayed storage of CSF led to changes in prostaglandin D-synthase derived peptides as well as to increased levels of certain amino acids and metabolites. The changes of metabolites, amino acids and proteins in the delayed storage study appear to be related to remaining white blood cells. Our recommendations are to centrifuge CSF samples immediately after collection to remove white blood cells, aliquot, and then snap-freeze the supernatant in liquid nitrogen for storage at -80 degrees C. Preferably samples should not be left in the autosampler for more than 24 h and freeze/thaw cycles should be avoided if at all possible.

Subject(s)

Cerebrospinal Fluid/chemistry , Protein Stability , Proteome/chemistry , Specimen Handling/methods , Tissue Preservation/methods , Amino Acids , Biomarkers/cerebrospinal fluid , Cryopreservation , Humans , Intramolecular Oxidoreductases/metabolism , Leukocytes/chemistry , Leukocytes/metabolism , Lipocalins/metabolism , Metabolomics , Peptides , Proteins , Proteome/metabolism , Proteomics/methods , Reference Standards , Specimen Handling/standards , Tissue Preservation/standards

Optimized time alignment algorithm for LC-MS data: correlation optimized warping using component detection algorithm-selected mass chromatograms.

Christin, Christin; Smilde, Age K; Hoefsloot, Huub C J; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter L.

Anal Chem ; 80(18): 7012-21, 2008 Sep 15.

Article in English | MEDLINE | ID: mdl-18715018

ABSTRACT

Correlation optimized warping (COW) based on the total ion current (TIC) is a widely used time alignment algorithm (COW-TIC). This approach works successfully on chromatograms containing few compounds and having a well-defined TIC. In this paper, we have combined COW with a component detection algorithm (CODA) to align LC-MS chromatograms containing thousands of biological compounds with overlapping chromatographic peaks, a situation where COW-TIC often fails. CODA is a variable selection procedure that selects mass chromatograms with low noise and low background (so-called "high-quality" mass chromatograms). High-quality mass chromatograms selected in each COW segment ensure that the same compounds (based on their mass and their retention time) are used in the two-dimensional benefit function of COW to obtain correct and optimal alignments (COW-CODA). The performance of the COW-CODA algorithm was evaluated on three types of complex data sets obtained from the LC-MS analysis of samples commonly used for biomarker discovery and compared to COW-TIC using a new global comparison method based on overlapping peak area: trypsin-digested serum obtained from cervical cancer patients, trypsin-digested serum from a single patient that was treated with varying preanalytical parameters (factorial design study), and urine from pregnant and nonpregnant women. While COW-CODA did result in minor misalignments in rare cases, it was clearly superior to the COW-TIC algorithm, especially when applied to highly variable chromatograms (factorial design, urine). The presented algorithm thus enables automatic time alignment and accurate peak matching of multiple LC-MS data sets obtained from complex body fluids that are often used for biomarker discovery.

Subject(s)

Algorithms , Chromatography, Liquid/methods , Mass Spectrometry/methods , Female , Humans , Pregnancy , Reference Standards , Reproducibility of Results , Time Factors , Trypsin/metabolism , Urine/chemistry , Uterine Cervical Neoplasms/blood

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL