Pesquisa | Portal Regional da BVS (teste)

Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data.

Vu, Thao; Siemek, Parker; Bhinderwala, Fatema; Xu, Yuhang; Powers, Robert.

J Proteome Res ; 18(9): 3282-3294, 2019 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-31382745

RESUMO

Analytical techniques such as NMR and mass spectrometry can generate large metabolomics data sets containing thousands of spectral features derived from numerous biological observations. Multivariate data analysis is routinely used to uncover the underlying biological information contained within these large metabolomics data sets. This is typically accomplished by classifying the observations into groups (e.g., control versus treated) and by identifying associated discriminating features. There are a variety of classification models to select from, which include some well-established techniques (e.g., principal component analysis [PCA], orthogonal projection to latent structure [OPLS], or partial least-squares projection to latent structures [PLS]) and newly emerging machine learning algorithms (e.g., support vector machines or random forests). However, it is unclear which classification model, if any, is an optimal choice for the analysis of metabolomics data. Herein, we present a comprehensive evaluation of five common classification models routinely employed in the metabolomics field and that are also currently available in our MVAPACK metabolomics software package. Simulated and experimental NMR data sets with various levels of group separation were used to evaluate each model. Model performance was assessed by classification accuracy rate, by the area under a receiver operating characteristic (AUROC) curve, and by the identification of true discriminating features. Our findings suggest that the five classification models perform equally well with robust data sets. Only when the models are stressed with subtle data set differences does OPLS emerge as the best-performing model. OPLS maintained a high-prediction accuracy rate and a large area under the ROC curve while yielding loadings closest to the true loadings with limited group separations.

Assuntos

Espectroscopia de Ressonância Magnética/métodos , Espectrometria de Massas/métodos , Metabolômica/métodos , Ressonância Magnética Nuclear Biomolecular/métodos , Algoritmos , Análise Discriminante , Análise dos Mínimos Quadrados , Espectroscopia de Ressonância Magnética/estatística & dados numéricos , Espectrometria de Massas/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Análise Multivariada , Análise de Componente Principal , Máquina de Vetores de Suporte

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA