Search | VHL Regional Portal

Processing and classification of protein mass spectra.

Hilario, Melanie; Kalousis, Alexandros; Pellegrini, Christian; Müller, Markus.

Mass Spectrom Rev ; 25(3): 409-49, 2006.

Article in English | MEDLINE | ID: mdl-16463283

ABSTRACT

Among the many applications of mass spectrometry, biomarker pattern discovery from protein mass spectra has aroused considerable interest in the past few years. While research efforts have raised hopes of early and less invasive diagnosis, they have also brought to light the many issues to be tackled before mass-spectra-based proteomic patterns become routine clinical tools. Known issues cover the entire pipeline leading from sample collection through mass spectrometry analytics to biomarker pattern extraction, validation, and interpretation. This study focuses on the data-analytical phase, which takes as input mass spectra of biological specimens and discovers patterns of peak masses and intensities that discriminate between different pathological states. We survey current work and investigate computational issues concerning the different stages of the knowledge discovery process: exploratory analysis, quality control, and diverse transforms of mass spectra, followed by further dimensionality reduction, classification, and model evaluation. We conclude after a brief discussion of the critical biomedical task of analyzing discovered discriminatory patterns to identify their component proteins as well as interpret and validate their biological implications.

Subject(s)

Mass Spectrometry/methods , Proteins/analysis , Algorithms , Animals , Biomarkers , Computational Biology , Humans , Mass Spectrometry/classification , Models, Chemical , Peptide Mapping , Proteomics

SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.

Cohen, Gilles; Hilario, Mélanie; Pellegrini, Christian; Geissbuhler, Antoine.

Stud Health Technol Inform ; 116: 193-8, 2005.

Article in English | MEDLINE | ID: mdl-16160258

ABSTRACT

This paper addresses the model selection problem for Support Vector Machines. A hybrid genetic algorithm guided by Direct Simplex Search to evolves hyperparameter values using an empirical error estimate as a steering criterion. This approach is specificaly tailored and experimentally evaluated on a health care problem which involves discriminating 11 % nosocomially infected patients from 89 % non infected patients. The combination of Direct Search Simplex with GAs is shown to improve the performance of GAs in terms of solution quality and computational efficiency. Unlike most other hyperparameter tuning techniques, our hybrid approach does not require supplementary effort such as computation of derivatives, making them well suited for practical purposes. This method produces encouraging results: it exhibits high performance and good convergence properties.

Subject(s)

Algorithms , Support Vector Machine , Artificial Intelligence , Humans , Models, Theoretical

An application of one-class support vector machine to nosocomial infection detection.

Cohen, Gilles; Hilario, Mélanie; Sax, Hugo; Hugonnet, Stéphane; Pellegrini, Christian; Geissbuhler, Antoine.

Stud Health Technol Inform ; 107(Pt 1): 716-20, 2004.

Article in English | MEDLINE | ID: mdl-15360906

ABSTRACT

Nosocomial infections (NIs)---those acquired in health care settings---are among the major causes of increased mortality among hospitalized patients. They are a significant burden for patients and health authorities alike; it is thus important to monitor and detect them through an effective surveillance system. This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. In this two-class classification task, the main difficulty lies in the significant imbalance between positive or infected (11%) and negative (89%) cases. To cope with class imbalance, we investigate one-class SVMs which can be trained to distinguish two classes on the basis of examples from a single class (in this case, only "normal" or non infected patients). The infected ones are then identified as "abnormal" cases or outliers that deviate significantly from the normal profile. Experimental results are encouraging: whereas standard 2-class SVMs scored a baseline sensitivity of 50.6% on this problem, the one-class approach increased sensitivity to as much as 92.6%. These results are comparable to those obtained by the authors in a previous study on asymmetrical soft margin SVMs; they suggest that one-class SVMs can provide an effective and efficient way of overcoming data imbalance in classification problems.

Subject(s)

Artificial Intelligence , Cross Infection/diagnosis , Algorithms , Cross Infection/epidemiology , Data Collection , Hospitals, University , Humans , Infection Control , Population Surveillance , Prevalence , Retrospective Studies , Switzerland/epidemiology

Machine learning approaches to lung cancer prediction from mass spectra.

Hilario, Melanie; Kalousis, Alexandros; Müller, Markus; Pellegrini, Christian.

Proteomics ; 3(9): 1716-9, 2003 Sep.

Article in English | MEDLINE | ID: mdl-12973731

ABSTRACT

We addressed the problem of discriminating between 24 diseased and 17 healthy specimens on the basis of protein mass spectra. To prepare the data, we performed mass to charge ratio (m/z) normalization, baseline elimination, and conversion of absolute peak height measures to height ratios. After preprocessing, the major difficulty encountered was the extremely large number of variables (1676 m/z values) versus the number of examples (41). Dimensionality reduction was treated as an integral part of the classification process; variable selection was coupled with model construction in a single ten-fold cross-validation loop. We explored different experimental setups involving two peak height representations, two variable selection methods, and six induction algorithms, all on both the original 1676-mass data set and on a prescreened 124-mass data set. Highest predictive accuracies (1-2 off-sample misclassifications) were achieved by a multilayer perceptron and Naïve Bayes, with the latter displaying more consistent performance (hence greater reliability) over varying experimental conditions. We attempted to identify the most discriminant peaks (proteins) on the basis of scores assigned by the two variable selection methods and by neural network based sensitivity analysis. These three scoring schemes consistently ranked four peaks as the most relevant discriminators: 11683, 1403, 17350 and 66107.

Subject(s)

Artificial Intelligence , Lung Neoplasms/diagnosis , Mass Spectrometry/methods , Proteins/chemistry , Algorithms , Computational Biology/methods , Databases, Protein , Humans , Mass Spectrometry/statistics & numerical data

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL