Pesquisa | Portal Regional da BVS (teste)

A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects.

Dundar, Murat; Akova, Ferit; Yerebakan, Halid Z; Rajwa, Bartek.

BMC Bioinformatics ; 15: 314, 2014 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-25248977

RESUMO

BACKGROUND: Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems.The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way. RESULTS: We demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively. CONCLUSIONS: These results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples.

Assuntos

Algoritmos , Biologia Computacional/métodos , Citometria de Fluxo , Fenótipo , Área Sob a Curva , Inteligência Artificial , Teorema de Bayes , Análise por Conglomerados , Leucemia Mieloide Aguda/patologia , Curva ROC , Estatísticas não Paramétricas , Processos Estocásticos

Discovering the unknown: detection of emerging pathogens using a label-free light-scattering system.

Rajwa, Bartek; Dundar, M Murat; Akova, Ferit; Bettasso, Amanda; Patsekin, Valery; Hirleman, E Dan; Bhunia, Arun K; Robinson, J Paul.

Cytometry A ; 77(12): 1103-12, 2010 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21108360

RESUMO

A recently introduced technique for pathogen recognition called BARDOT (BActeria Rapid Detection using Optical scattering Technology) belongs to the broad class of optical sensors and relies on forward-scatter phenotyping (FSP). The specificity of FSP derives from the morphological information that bacterial material encodes on a coherent optical wavefront passing through the colony. The system collects elastically scattered light patterns that, given a constant environment, are unique to each bacterial species and serovar. The notable similarity between FSP technology and spectroscopies is their reliance on statistical machine learning to perform recognition. Currently used methods utilize traditional supervised techniques which assume completeness of training libraries. However, this restrictive assumption is known to be false for most experimental conditions, resulting in unsatisfactory levels of accuracy, poor specificity, and consequently limited overall performance for biodetection and classification tasks. The presented work demonstrates application of the BARDOT system to classify bacteria belonging to the Salmonella class in a nonexhaustive framework, that is, without full knowledge about all the possible classes that can be encountered. Our study uses a Bayesian approach to learning with a nonexhaustive training dataset to allow for the automated detection of unknown bacterial classes.

Assuntos

Técnicas de Tipagem Bacteriana/instrumentação , Técnicas Biossensoriais/instrumentação , Luz , Salmonella/classificação , Salmonella/isolamento & purificação , Espalhamento de Radiação , Teorema de Bayes , Microbiologia de Alimentos , Reconhecimento Automatizado de Padrão , Sensibilidade e Especificidade

A Machine-Learning Approach to Detecting Unknown Bacterial Serovars.

Akova, Ferit; Dundar, Murat; Davisson, V Jo; Hirleman, E Daniel; Bhunia, Arun K; Robinson, J Paul; Rajwa, Bartek.

Stat Anal Data Min ; 3(5): 289-301, 2010 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-22162745

RESUMO

Technologies for rapid detection of bacterial pathogens are crucial for securing the food supply. A light-scattering sensor recently developed for real-time identification of multiple colonies has shown great promise for distinguishing bacteria cultures. The classification approach currently used with this system relies on supervised learning. For accurate classification of bacterial pathogens, the training library should be exhaustive, i.e., should consist of samples of all possible pathogens. Yet, the sheer number of existing bacterial serovars and more importantly the effect of their high mutation rate would not allow for a practical and manageable training. In this study, we propose a Bayesian approach to learning with a nonexhaustive training dataset for automated detection of unmatched bacterial serovars, i.e., serovars for which no samples exist in the training library. The main contribution of our work is the Wishart conjugate priors defined over class distributions. This allows us to employ the prior information obtained from known classes to make inferences about unknown classes as well. By this means, we identify new classes of informational value and dynamically update the training dataset with these classes to make it increasingly more representative of the sample population. This results in a classifier with improved predictive performance for future samples. We evaluated our approach on a 28-class bacteria dataset and also on the benchmark 26-class letter recognition dataset for further validation. The proposed approach is compared against state-of-the-art involving density-based approaches and support vector domain description, as well as a recently introduced Bayesian approach based on simulated classes.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA