Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 9(10): e111318, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25347824

RESUMO

BACKGROUND: A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response. METHODS: This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms. RESULTS: The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample's subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. CONCLUSION: The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.


Assuntos
Algoritmos , Biomarcadores Tumorais/classificação , Conjuntos de Dados como Assunto , Análise por Conglomerados , Humanos
2.
BMC Bioinformatics ; 14 Suppl 14: S15, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267777

RESUMO

BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases. RESULTS: In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC. CONCLUSIONS: The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.


Assuntos
Biologia Computacional/métodos , Eletroforese em Gel de Campo Pulsado/métodos , Salmonella/classificação , Análise por Conglomerados , Mineração de Dados , Bases de Dados Genéticas , Humanos , Salmonella/química , Salmonella/genética , Sorotipagem
3.
PLoS One ; 8(8): e71680, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23940779

RESUMO

Biclustering has emerged as an important approach to the analysis of large-scale datasets. A biclustering technique identifies a subset of rows that exhibit similar patterns on a subset of columns in a data matrix. Many biclustering methods have been proposed, and most, if not all, algorithms are developed to detect regions of "coherence" patterns. These methods perform unsatisfactorily if the purpose is to identify biclusters of a constant level. This paper presents a two-step biclustering method to identify constant level biclusters for binary or quantitative data. This algorithm identifies the maximal dimensional submatrix such that the proportion of non-signals is less than a pre-specified tolerance δ. The proposed method has much higher sensitivity and slightly lower specificity than several prominent biclustering methods from the analysis of two synthetic datasets. It was further compared with the Bimax method for two real datasets. The proposed method was shown to perform the most robust in terms of sensitivity, number of biclusters and number of serotype-specific biclusters identified. However, dichotomization using different signal level thresholds usually leads to different sets of biclusters; this also occurs in the present analysis.


Assuntos
Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Perfilação da Expressão Gênica/estatística & dados numéricos , Bases de Dados Genéticas/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Humanos , Modelos Teóricos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos
4.
Pharmacogenomics ; 14(8): 969-80, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23746190

RESUMO

Pharmacogenomics examines how the benefits and adverse effects of a drug vary among patients in a target population by analyzing genomic profiles of individual patients. Personalized medicine prescribes specific therapeutics that best suit an individual patient. Much current research focuses on developing genomic biomarkers to identify patients, to identify which patients would benefit from a treatment, have an adverse response, or no response at all, prior to treatment according to relevant differences in risk factors, disease types and/or responses to therapy. This review describes the use of the two personalized medicine biomarkers, prognostic and predictive, to classify patients into subgroups for treatment recommendation.


Assuntos
Biomarcadores Farmacológicos , Farmacogenética , Medicina de Precisão , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Prognóstico
5.
PLoS One ; 8(3): e59224, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23516614

RESUMO

A database was constructed consisting of 45,923 Salmonella pulsed-field gel electrophoresis (PFGE) patterns. The patterns, randomly selected from all submissions to CDC PulseNet during 2005 to 2010, included the 20 most frequent serotypes and 12 less frequent serotypes. Meta-analysis was applied to all of the PFGE patterns in the database. In the range of 20 to 1100 kb, serotype Enteritidis averaged the fewest bands at 12 bands and Paratyphi A the most with 19, with most serotypes in the 13-15 range among the 32 serptypes. The 10 most frequent bands for each of the 32 serotypes were sorted and distinguished, and the results were in concordance with those from distance matrix and two-way hierarchical cluster analyses of the patterns in the database. The hierarchical cluster analysis divided the 32 serotypes into three major groups according to dissimilarity measures, and revealed for the first time the similarities among the PFGE patterns of serotype Saintpaul to serotypes Typhimurium, Typhimurium var. 5-, and I 4,[5],12:i:-; of serotype Hadar to serotype Infantis; and of serotype Muenchen to serotype Newport. The results of the meta-analysis indicated that the pattern similarities/dissimilarities determined the serotype discrimination of PFGE method, and that the possible PFGE markers may have utility for serotype identification. The presence of distinct, serotype specific patterns may provide useful information to aid in the distribution of serotypes in the population and potentially reduce the need for laborious analyses, such as traditional serotyping.


Assuntos
Eletroforese em Gel de Campo Pulsado/métodos , Salmonella/metabolismo , Sorotipagem/métodos , Bases de Dados Factuais , Salmonella/classificação
6.
BMC Med Res Methodol ; 13: 25, 2013 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-23425000

RESUMO

BACKGROUND: Two most important considerations in evaluation of survival prediction models are 1) predictability - ability to predict survival risks accurately and 2) reproducibility - ability to generalize to predict samples generated from different studies. We present approaches for assessment of reproducibility of survival risk score predictions across medical centers. METHODS: Reproducibility was evaluated in terms of consistency and transferability. Consistency is the agreement of risk scores predicted between two centers. Transferability from one center to another center is the agreement of the risk scores of the second center predicted by each of the two centers. The transferability can be: 1) model transferability - whether a predictive model developed from one center can be applied to predict the samples generated from other centers and 2) signature transferability - whether signature markers of a predictive model developed from one center can be applied to predict the samples from other centers. We considered eight prediction models, including two clinical models, two gene expression models, and their combinations. Predictive performance of the eight models was evaluated by several common measures. Correlation coefficients between predicted risk scores of different centers were computed to assess reproducibility - consistency and transferability. RESULTS: Two public datasets, the lung cancer data generated from four medical centers and colon cancer data generated from two medical centers, were analyzed. The risk score estimates for lung cancer patients predicted by three of four centers agree reasonably well. In general, a good prediction model showed better cross-center consistency and transferability. The risk scores for the colon cancer patients from one (Moffitt) medical center that were predicted by the clinical models developed from the another (Vanderbilt) medical center were shown to have excellent model transferability and signature transferability. CONCLUSIONS: This study illustrates an analytical approach to assessing reproducibility of predictive models and signatures. Based on the analyses of the two cancer datasets, we conclude that the models with clinical variables appear to perform reasonable well with high degree of consistency and transferability. There should have more investigations on the reproducibility of prediction models including gene expression data across studies.


Assuntos
Neoplasias do Colo/mortalidade , Interpretação Estatística de Dados , Neoplasias Pulmonares/mortalidade , Humanos , Valor Preditivo dos Testes , Prognóstico , Modelos de Riscos Proporcionais , Reprodutibilidade dos Testes , Risco , Sobrevida , Resultado do Tratamento
7.
J Biopharm Stat ; 23(1): 146-60, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23331228

RESUMO

The Adverse Event Reporting System (AERS) is the primary database designed to support the Food and Drug Administration (FDA) postmarketing safety surveillance program for all approved drugs and therapeutic biologic products. Most current disproportionality analysis focuses on the detection of potential adverse events (AE) involving a single drug and a single AE only. In this paper, we present a data mining biclustering technique based on the singular value decomposition to extract local regions of association for a safety study. The analysis consists of collection of biclusters, each representing an association between a set of drugs with the corresponding set of adverse events. Significance of each bicluster can be tested using disproportionality analysis. Individual drug-event combination can be further tested. A safety data set consisting of 193 drugs with 8453 adverse events is analyzed as an illustration.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Mineração de Dados/métodos , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Detecção de Sinal Psicológico
8.
Hepatol Int ; 7(1): 171-9, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26201631

RESUMO

PURPOSE: IL28B genotypes have a strong impact on treatment outcomes of chronic hepatitis C (CHC) and on-treatment viral kinetics. Since metabolic regulation and interferon response are highly integrated, metabolic profiles may play an important role in the link between IL28B genotypes and hepatitis C virus (HCV) infection. Thus, the association of IL28B rs8099917 genotypes with metabolic profiles and the impact of metabolic profiles on hepatitis C viral kinetic parameters were examined. METHODS: A case-control analysis including 278 CHC patients and 280 subjects without chronic HCV infection was performed. The associations of IL28B rs8099917 genotype with pretreatment metabolic profiles and early viral kinetic parameters were evaluated. RESULTS: Compared to HCV genotype 1 patients, the differences in metabolic profiles were more significant in genotype 2 patients. HCV genotype 2 patients with TT genotype had higher serum total cholesterol and high density lipoprotein (HDL) levels than those with GT genotype, and the differences remained significant when adjusted for age, sex, and body mass index (p = 0.005 for total cholesterol; p = 0.006 for HDL). In addition, patients with higher serum TG, higher fasting blood glucose, and lower HDL had a lower viral clearance rate. CONCLUSIONS: IL28B genotypes may affect lipid profiles of CHC patients, especially in HCV-genotype 2 patients. Patients with higher serum fasting blood glucose, triglyceride, and lower HDL have a lower viral clearance rate during pegylated interferon plus ribavirin therapy.

9.
BMC Med Res Methodol ; 12: 102, 2012 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-22824262

RESUMO

BACKGROUND: Cancer survival studies are commonly analyzed using survival-time prediction models for cancer prognosis. A number of different performance metrics are used to ascertain the concordance between the predicted risk score of each patient and the actual survival time, but these metrics can sometimes conflict. Alternatively, patients are sometimes divided into two classes according to a survival-time threshold, and binary classifiers are applied to predict each patient's class. Although this approach has several drawbacks, it does provide natural performance metrics such as positive and negative predictive values to enable unambiguous assessments. METHODS: We compare the survival-time prediction and survival-time threshold approaches to analyzing cancer survival studies. We review and compare common performance metrics for the two approaches. We present new randomization tests and cross-validation methods to enable unambiguous statistical inferences for several performance metrics used with the survival-time prediction approach. We consider five survival prediction models consisting of one clinical model, two gene expression models, and two models from combinations of clinical and gene expression models. RESULTS: A public breast cancer dataset was used to compare several performance metrics using five prediction models. 1) For some prediction models, the hazard ratio from fitting a Cox proportional hazards model was significant, but the two-group comparison was insignificant, and vice versa. 2) The randomization test and cross-validation were generally consistent with the p-values obtained from the standard performance metrics. 3) Binary classifiers highly depended on how the risk groups were defined; a slight change of the survival threshold for assignment of classes led to very different prediction results. CONCLUSIONS: 1) Different performance metrics for evaluation of a survival prediction model may give different conclusions in its discriminatory ability. 2) Evaluation using a high-risk versus low-risk group comparison depends on the selected risk-score threshold; a plot of p-values from all possible thresholds can show the sensitivity of the threshold selection. 3) A randomization test of the significance of Somers' rank correlation can be used for further evaluation of performance of a prediction model. 4) The cross-validated power of survival prediction models decreases as the training and test sets become less balanced.


Assuntos
Neoplasias da Mama/mortalidade , Modelos Estatísticos , Análise de Sobrevida , Área Sob a Curva , Neoplasias da Mama/diagnóstico , Simulação por Computador , Intervalo Livre de Doença , Feminino , Humanos , Modelos Logísticos , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC , Máquina de Vetores de Suporte
10.
J Clin Microbiol ; 50(5): 1524-32, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22378901

RESUMO

A classification model is presented for rapid identification of Salmonella serotypes based on pulsed-field gel electrophoresis (PFGE) fingerprints. The classification model was developed using random forest and support vector machine algorithms and was then applied to a database of 45,923 PFGE patterns, randomly selected from all submissions to CDC PulseNet from 2005 to 2010. The patterns selected included the top 20 most frequent serotypes and 12 less frequent serotypes from various sources. The prediction accuracies for the 32 serotypes ranged from 68.8% to 99.9%, with an overall accuracy of 96.0% for the random forest classification, and ranged from 67.8% to 100.0%, with an overall accuracy of 96.1% for the support vector machine classification. The prediction system improves reliability and accuracy and provides a new tool for early and fast screening and source tracking of outbreak isolates. It is especially useful to get serotype information before the conventional methods are done. Additionally, this system also works well for isolates that are serotyped as "unknown" by conventional methods, and it is useful for a laboratory where standard serotyping is not available.


Assuntos
Impressões Digitais de DNA/métodos , Eletroforese em Gel de Campo Pulsado/métodos , Tipagem Molecular/métodos , Salmonella/classificação , Salmonella/genética , Análise por Conglomerados , Biologia Computacional/métodos , Genótipo , Humanos , Salmonella/isolamento & purificação , Sorotipagem
11.
Proc Natl Acad Sci U S A ; 108(39): 16301-6, 2011 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-21930929

RESUMO

Juvenile male rhesus monkeys treated with methylphenidate hydrochloride (MPH) to evaluate genetic and behavioral toxicity were observed after 14 mo of treatment to have delayed pubertal progression with impaired testicular descent and reduced testicular volume. Further evaluation of animals dosed orally twice a day with (i) 0.5 mL/kg of vehicle (n = 10), (ii) 0.15 mg/kg of MPH increased to 2.5 mg/kg (low dose, n = 10), or (iii) 1.5 mg/kg of MPH increased to 12.5 mg/kg (high dose, n = 10) for a total of 40 mo revealed that testicular volume was significantly reduced (P < 0.05) at months 15 to 19 and month 27. Testicular descent was significantly delayed (P < 0.05) in the high-dose group. Significantly lower serum testosterone levels were detected in both the low- (P = 0.0017) and high-dose (P = 0.0011) animals through month 33 of treatment. Although serum inhibin B levels were increased overall in low-dose animals (P = 0.0328), differences between groups disappeared by the end of the study. Our findings indicate that MPH administration, beginning before puberty, and which produced clinically relevant blood levels of the drug, impaired pubertal testicular development until ∼5 y of age. It was not possible to resolve whether MPH delayed the initiation of the onset of puberty or reduced the early tempo of the developmental process. Regardless, deficits in testicular volume and hormone secretion disappeared over the 40-mo observation period, suggesting that the impact of MPH on puberty is not permanent.


Assuntos
Estimulantes do Sistema Nervoso Central/farmacologia , Metilfenidato/farmacologia , Maturidade Sexual/efeitos dos fármacos , Animais , Macaca mulatta , Masculino , Testículo/efeitos dos fármacos , Testículo/crescimento & desenvolvimento , Testosterona/sangue
12.
Proc Natl Acad Sci U S A ; 108(9): 3719-24, 2011 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-21321200

RESUMO

Asian patients with chronic hepatitis C (CHC) are known to have better virological responses to pegylated (Peg) IFN-based therapy than Western patients. Although IL28B gene polymorphisms may contribute to this difference, whether favorable hepatitis C virus (HCV) kinetics during treatment plays a role remains unclear. We enrolled 145 consecutive Taiwanese patients with CHC receiving Peg-IFN α-2a plus ribavirin for the study. Blood samples were taken more frequently at defined intervals in the first 3 d. Peg-IFN was administered at week 1. It was then administered weekly in combination with daily ribavirin for 24 or 48 wk. A mathematical model fitted to the observed HCV kinetics was constructed, which could interpret the transient HCV titer elevation after Peg-IFN treatment. The results demonstrated a comparable viral clearance rate (c = 3.45 ± 3.73) (day(-1), mean ± SD) but lower daily viral production rate (P = 10(6)-10(12)) in our patients than those reported previously in Western patients. Of 110 patients with a sustained virological response (SVR), 47 (43%) had a transient elevation of viral titer within 12 h (proportion of 12 h/3 d: 44% in non-SVR vs. 70% in SVR; P = 0.029). Among 91 patients with available rs8099917 data, patients with the TT genotype had an early surge of viral titer after therapy and a higher SVR and viral clearance rate than those with the GT genotype. In conclusion, Taiwanese patients with CHC receiving Peg-IFN plus ribavirin therapy have a lower daily viral production rate than Western patients, and the rs8099917 TT genotype may contribute to the increased viral clearance rate and better virological responses in these patients.


Assuntos
Antivirais/uso terapêutico , Predisposição Genética para Doença , Hepacivirus/fisiologia , Hepatite C Crônica/genética , Hepatite C Crônica/virologia , Interleucinas/genética , Polimorfismo de Nucleotídeo Único/genética , Antivirais/administração & dosagem , Quimioterapia Combinada , Hepatite C Crônica/tratamento farmacológico , Humanos , Interferon alfa-2 , Interferon-alfa/administração & dosagem , Interferon-alfa/uso terapêutico , Interferons , Cinética , Modelos Biológicos , Análise Multivariada , Polietilenoglicóis/administração & dosagem , Polietilenoglicóis/uso terapêutico , Proteínas Recombinantes , Ribavirina/administração & dosagem , Ribavirina/uso terapêutico , Fatores de Tempo , Carga Viral
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...