Pesquisa | Portal Regional da BVS

Biomarker identification from next-generation sequencing data for pathogen bacteria characterization and surveillance.

Zhao, Weizhong; Chen, James J; Foley, Steven; Wang, Yuping; Zhao, Shaohua; Basinger, John; Zou, Wen.

Biomark Med ; 9(11): 1253-64, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26501894

RESUMO

AIM: The purpose was to develop an analytical pipeline for specific gene analysis and biomarker discovery from next generation sequencing (NGS) data. MATERIALS & METHODS: As a test case, the fliC gene reference sequences of 24 Salmonella enterica strains of 13 serotypes and NGS reads of 32 serovar Newport, 48 Montevideo and 115 Enteritidis outbreak isolates were retrieved from the National Center for Biotechnology Information database. RESULTS: Establishment of an analytical pipeline consisting of four steps: reference sequences retrieval and template sequence determination; NGS sequence reads retrieval; multiple sequence alignments and phylogenetic analysis; data mining and biomarker discovery. CONCLUSION: The pipeline developed provides an effective bioinformatics tool for genetic diversity clarification and marker sequences discovery for pathogen characterization and surveillance.

Assuntos

Biomarcadores/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Salmonella enterica/isolamento & purificação , Proteínas de Bactérias/genética , Genômica , Humanos , Filogenia , Salmonella enterica/genética , Salmonella enterica/metabolismo

Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics.

Chung, Ming-Hua; Wang, Yuping; Tang, Hailin; Zou, Wen; Basinger, John; Xu, Xiaowei; Tong, Weida.

Front Pharmacol ; 6: 81, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25941488

RESUMO

The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA