Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
PLoS One ; 13(9): e0204425, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30261000

RESUMO

MOTIVATION: The measurement of disease biomarkers in easily-obtained bodily fluids has opened the door to a new type of non-invasive medical diagnostics. New technologies are being developed and fine-tuned in order to make this possibility a reality. One such technology is Field Asymmetric Ion Mobility Spectrometry (FAIMS), which allows the measurement of volatile organic compounds (VOCs) in biological samples such as urine. These VOCs are known to contain a range of information on the relevant person's metabolism and can in principle be used for disease diagnostic purposes. Key to the effective use of such data are well-developed data processing pipelines, which are necessary to extract the most useful data from the complex underlying biological structure. RESULTS: In this study, we present a new data analysis pipeline for FAIMS data, and demonstrate a number of improvements over previously used methods. We evaluate the effect of a series of candidate operational steps during data processing, such as the use of wavelet transforms, principal component analysis (PCA), and classifier ensembles. We also demonstrate the use of FAIMS data in our pipeline to diagnose diabetes on the basis of a simple urine sample using machine learning classifiers. We present results for data generated from a case-control study of 115 urine samples, collected from 72 type II diabetic patients, with 43 healthy volunteers as negative controls. The resulting pipeline combines the steps that resulted in the best classification model performance. These include the use of a two-dimensional discrete wavelet transform, and the Wilcoxon rank-sum test for feature selection. We are able to achieve a best ROC curve AUC of 0.825 (0.747-0.9, 95% CI) for classification of diabetes vs control. We also note that this result is robust to changes in the data pipeline and different analysis runs, with AUC > 0.80 achieved in a range of cases. This is a substantial improvement in performance over previously used data processing methods in this area. Our ability to make strong statements about FAIMS ability to diagnose diabetes is sadly limited, as we found confounding effects from the demographics when including these data in the pipeline. The demographics alone produced a best AUC of 0.87 (0.795-0.94, 95% CI). While the combination of the demographics and FAIMS data resulted in an improvement on the AUC (0.907; 0.848-0.97, 95% CI), it did not prove to be a significant difference. Nevertheless, the pipeline itself shows a significant improvement in performance over more basic methods which have been used with FAIMS data in the past.


Assuntos
Diabetes Mellitus/urina , Diagnóstico por Computador/métodos , Compostos Orgânicos Voláteis/urina , Área Sob a Curva , Biomarcadores/urina , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Projetos Piloto
2.
PLoS One ; 12(12): e0188879, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29252995

RESUMO

OBJECTIVES: New point of care diagnostics are urgently needed to reduce the over-prescription of antimicrobials for bacterial respiratory tract infection (RTI). We performed a pilot cross sectional study to assess the feasibility of gas-capillary column ion mobility spectrometer (GC-IMS), for the analysis of volatile organic compounds (VOC) in exhaled breath to diagnose bacterial RTI in hospital inpatients. METHODS: 71 patients were prospectively recruited from the Acute Medical Unit of the Royal Liverpool University Hospital between March and May 2016 and classified as confirmed or probable bacterial or viral RTI on the basis of microbiologic, biochemical and radiologic testing. Breath samples were collected at the patient's bedside directly into the electronic nose device, which recorded a VOC spectrum for each sample. Sparse principal component analysis and sparse logistic regression were used to develop a diagnostic model to classify VOC spectra as being caused by bacterial or non-bacterial RTI. RESULTS: Summary area under the receiver operator characteristic curve was 0.73 (95% CI 0.61-0.86), summary sensitivity and specificity were 62% (95% CI 41-80%) and 80% (95% CI 64-91%) respectively (p = 0.00147). CONCLUSIONS: GC-IMS analysis of exhaled VOC for the diagnosis of bacterial RTI shows promise in this pilot study and further trials are warranted to assess this technique.


Assuntos
Infecções Bacterianas/diagnóstico , Nariz Eletrônico , Metabolômica , Infecções Respiratórias/diagnóstico , Compostos Orgânicos Voláteis/análise , Idoso , Infecções Bacterianas/microbiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Projetos Piloto , Curva ROC , Infecções Respiratórias/microbiologia
3.
Arthritis Res Ther ; 18(1): 250, 2016 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-27788684

RESUMO

BACKGROUND: There is currently no blood-based test for detection of early-stage osteoarthritis (OA) and the anti-cyclic citrullinated peptide (CCP) antibody test for rheumatoid arthritis (RA) has relatively low sensitivity for early-stage disease. Morbidity in arthritis could be markedly decreased if early-stage arthritis could be routinely detected and classified by clinical chemistry test. We hypothesised that damage to proteins of the joint by oxidation, nitration and glycation, and with signatures released in plasma as oxidized, nitrated and glycated amino acids may facilitate early-stage diagnosis and typing of arthritis. METHODS: Patients with knee joint early-stage and advanced OA and RA or other inflammatory joint disease (non-RA) and healthy subjects with good skeletal health were recruited for the study (n = 225). Plasma/serum and synovial fluid was analysed for oxidized, nitrated and glycated proteins and amino acids by quantitative liquid chromatography-tandem mass spectrometry. Data-driven machine learning methods were employed to explore diagnostic utility of the measurements for detection and classifying early-stage OA and RA, non-RA and good skeletal health with training set and independent test set cohorts. RESULTS: Glycated, oxidized and nitrated proteins and amino acids were detected in synovial fluid and plasma of arthritic patients with characteristic patterns found in early and advanced OA and RA, and non-RA, with respect to healthy controls. In early-stage disease, two algorithms for consecutive use in diagnosis were developed: (1) disease versus healthy control, and (2) classification as OA, RA and non-RA. The algorithms featured 10 damaged amino acids in plasma, hydroxyproline and anti-CCP antibody status. Sensitivities/specificities were: (1) good skeletal health, 0.92/0.91; (2) early-stage OA, 0.92/0.90; early-stage RA, 0.80/0.78; and non-RA, 0.70/0.65 (training set). These were confirmed in independent test set validation. Damaged amino acids increased further in severe and advanced OA and RA. CONCLUSIONS: Oxidized, nitrated and glycated amino acids combined with hydroxyproline and anti-CCP antibody status provided a plasma-based biochemical test of relatively high sensitivity and specificity for early-stage diagnosis and typing of arthritic disease.


Assuntos
Biomarcadores/sangue , Diagnóstico Precoce , Osteoartrite do Joelho/diagnóstico , Processamento de Proteína Pós-Traducional , Adulto , Idoso , Algoritmos , Aminoácidos/metabolismo , Área Sob a Curva , Cromatografia Líquida , Progressão da Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Nitrosação , Osteoartrite do Joelho/sangue , Oxirredução , Estresse Oxidativo , Curva ROC , Sensibilidade e Especificidade , Espectrometria de Massas em Tandem
4.
Tuberculosis (Edinb) ; 99: 143-146, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27450016

RESUMO

Tuberculosis (TB) remains one of the world's major health burdens with 9.6 million new infections globally. Though considerable progress has been made in reduction of TB incidence and mortality, there is a continuous need for lower cost, simpler and more robust means of diagnosis. One method that may fulfil these requirements is in the area of breath analysis. In this study we analysed the breath of 21 patients with pulmonary or extra-pulmonary TB, recruited from a UK teaching hospital (University Hospital Coventry and Warwickshire) before or within 1 week of commencing treatment for TB. TB diagnosis was confirmed by reference tests (mycobacterial culture), histology or radiology. 19 controls were recruited to calculate specificity; these patients were all interferon-gamma release assay negative (T.SPOT(®).TB, Oxford Immunotec Ltd.). Whole breath samples were collected with subsequent chemical analysis undertaken by Ion Mobility Spectrometry. Our results produced a sensitivity of 81% and a specificity of 79% for all cases of TB (pulmonary and extra-pulmonary). Though lower than other studies analysing pulmonary TB alone, we believe that this technique shows promise, and a higher sensitivity could be achieved by further improving our sample capture methodology.


Assuntos
Testes Respiratórios/métodos , Íons , Mycobacterium tuberculosis/patogenicidade , Tuberculose Pulmonar/diagnóstico , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Antituberculosos/uso terapêutico , Área Sob a Curva , Técnicas Bacteriológicas , Testes Respiratórios/instrumentação , Estudos de Casos e Controles , Inglaterra , Desenho de Equipamento , Feminino , Hospitais de Ensino , Humanos , Testes de Liberação de Interferon-gama , Masculino , Pessoa de Meia-Idade , Movimento (Física) , Mycobacterium tuberculosis/efeitos dos fármacos , Projetos Piloto , Valor Preditivo dos Testes , Curva ROC , Reprodutibilidade dos Testes , Análise Espectral , Tuberculose Pulmonar/tratamento farmacológico , Tuberculose Pulmonar/microbiologia , Adulto Jovem
5.
R Soc Open Sci ; 3(2): 140501, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26998311

RESUMO

Predicting response to treatment and disease-specific deaths are key tasks in cancer research yet there is a lack of methodologies to achieve these. Large-scale 'omics and digital pathology technologies have led to the need for effective statistical methods for data fusion to extract the most useful patterns from these diverse data types. We present FusionGP, a method for combining heterogeneous data types designed specifically for predicting outcome of treatment and disease. FusionGP is a Gaussian process model that includes a generalization of feature selection for biomarker discovery, allowing for simultaneous, sparse feature selection across multiple data types. Importantly, it can accommodate highly nonlinear structure in the data, and automatically infers the optimal contribution from each input data type. FusionGP compares favourably to several popular classification methods, including the Random Forest classifier, a stepwise logistic regression model and the Support Vector Machine on single data types. By combining gene expression, copy number alteration and digital pathology image data in 119 estrogen receptor (ER)-negative and 345 ER-positive breast tumours, we aim to predict two important clinical outcomes: death and chemoinsensitivity. While gene expression data give the best predictive performance in the majority of cases, the digital pathology data are much better for predicting death in ER cases. Thus, FusionGP is a new tool for selecting informative features from heterogeneous data types and predicting treatment response and prognosis.

6.
PLoS One ; 11(2): e0149756, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26901314

RESUMO

BACKGROUND: Highly sensitive and specific urine-based tests to detect either primary or recurrent bladder cancer have proved elusive to date. Our ever increasing knowledge of the genomic aberrations in bladder cancer should enable the development of such tests based on urinary DNA. METHODS: DNA was extracted from urine cell pellets and PCR used to amplify the regions of the TERT promoter and coding regions of FGFR3, PIK3CA, TP53, HRAS, KDM6A and RXRA which are frequently mutated in bladder cancer. The PCR products were barcoded, pooled and paired-end 2 x 250 bp sequencing performed on an Illumina MiSeq. Urinary DNA was analysed from 20 non-cancer controls, 120 primary bladder cancer patients (41 pTa, 40 pT1, 39 pT2+) and 91 bladder cancer patients post-TURBT (89 cancer-free). RESULTS: Despite the small quantities of DNA extracted from some urine cell pellets, 96% of the samples yielded mean read depths >500. Analysing only previously reported point mutations, TERT mutations were found in 55% of patients with bladder cancer (independent of stage), FGFR3 mutations in 30% of patients with bladder cancer, PIK3CA in 14% and TP53 mutations in 12% of patients with bladder cancer. Overall, these previously reported bladder cancer mutations were detected in 86 out of 122 bladder cancer patients (70% sensitivity) and in only 3 out of 109 patients with no detectable bladder cancer (97% specificity). CONCLUSION: This simple, cost-effective approach could be used for the non-invasive surveillance of patients with non-muscle-invasive bladder cancers harbouring these mutations. The method has a low DNA input requirement and can detect low levels of mutant DNA in a large excess of normal DNA. These genes represent a minimal biomarker panel to which extra markers could be added to develop a highly sensitive diagnostic test for bladder cancer.


Assuntos
DNA de Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reação em Cadeia da Polimerase Multiplex/métodos , Mutação , Proteínas de Neoplasias/genética , Neoplasias da Bexiga Urinária , Idoso , Idoso de 80 Anos ou mais , DNA de Neoplasias/genética , DNA de Neoplasias/urina , Feminino , Humanos , Masculino , Sensibilidade e Especificidade , Neoplasias da Bexiga Urinária/genética , Neoplasias da Bexiga Urinária/urina
7.
BMC Syst Biol ; 9: 76, 2015 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-26553024

RESUMO

BACKGROUND: Cytokine-hormone network deregulations underpin pathologies ranging from autoimmune disorders to cancer, but our understanding of these networks in physiological/pathophysiological states remains patchy. We employed Bayesian networks to analyze cytokine-hormone interactions in vivo using murine lactation as a dynamic, physiological model system. RESULTS: Circulatory levels of estrogen, progesterone, prolactin and twenty-three cytokines were profiled in post partum mice with/without pups. The resultant networks were very robust and assembled about structural hubs, with evidence that interleukin (IL)-12 (p40), IL-13 and monocyte chemoattractant protein (MCP)-1 were the primary drivers of network behavior. Network structural conservation across physiological scenarios coupled with the successful empirical validation of our approach suggested that in silico network perturbations can predict in vivo qualitative responses. In silico perturbation of network components also captured biological features of cytokine interactions (antagonism, synergy, redundancy). CONCLUSION: These findings highlight the potential of network-based approaches in identifying novel cytokine pharmacological targets and in predicting the effects of their exogenous manipulation in inflammatory/immune disorders.


Assuntos
Quimiocina CCL2/metabolismo , Citocinas/metabolismo , Interleucina-12/metabolismo , Interleucina-13/metabolismo , Modelos Biológicos , Animais , Teorema de Bayes , Feminino , Hormônios/sangue , Lactação/fisiologia , Camundongos , Período Pós-Parto , Mapas de Interação de Proteínas
8.
J Gastrointestin Liver Dis ; 24(2): 197-201, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26114180

RESUMO

BACKGROUND & AIMS: Non-Alcoholic Fatty Liver Disease (NAFLD) is the commonest cause of chronic liver disease in the western world. Current diagnostic methods including Fibroscan have limitations, thus there is a need for more robust non-invasive screening methods. The gut microbiome is altered in several gastrointestinal and hepatic disorders resulting in altered, unique gut fermentation patterns, detectable by analysis of volatile organic compounds (VOCs) in urine, breath and faeces. We performed a proof of principle pilot study to determine if progressive fatty liver disease produced an altered urinary VOC pattern; specifically NAFLD and Non-Alcoholic Steatohepatitis (NASH). METHODS: 34 patients were recruited: 8 NASH cirrhotics (NASH-C); 7 non-cirrhotic NASH; 4 NAFLD and 15 controls. Urine was collected and stored frozen. For assay, the samples were defrosted and aliquoted into vials, which were heated to 40±0.1°C and the headspace analyzed by FAIMS (Field Asymmetric Ion Mobility Spectroscopy). A previously used data processing pipeline employing a Random Forrest classification algorithm and using a 10 fold cross validation method was applied. RESULTS: Urinary VOC results demonstrated sensitivity of 0.58 (0.33 - 0.88), but specificity of 0.93 (0.68 - 1.00) and an Area Under Curve (AUC) 0.73 (0.55 - 0.90) to distinguish between liver disease and controls. However, NASH/NASH-C was separated from the NAFLD/controls with a sensitivity of 0.73 (0.45 - 0.92), specificity of 0.79 (0.54 - 0.94) and AUC of 0.79 (0.64 - 0.95), respectively. CONCLUSIONS: This pilot study suggests that urinary VOCs detection may offer the potential for early non-invasive characterisation of liver disease using 'smell prints' to distinguish between NASH and NAFLD.


Assuntos
Hepatopatia Gordurosa não Alcoólica/urina , Compostos Orgânicos Voláteis/urina , Idoso , Área Sob a Curva , Biomarcadores/urina , Estudos de Casos e Controles , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Projetos Piloto , Valor Preditivo dos Testes , Estudos Prospectivos , Curva ROC , Análise Espectral , Urinálise
9.
BMC Cancer ; 15: 117, 2015 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-25886033

RESUMO

BACKGROUND: Patient response to chemotherapy for ovarian cancer is extremely heterogeneous and there are currently no tools to aid the prediction of sensitivity or resistance to chemotherapy and allow treatment stratification. Such a tool could greatly improve patient survival by identifying the most appropriate treatment on a patient-specific basis. METHODS: PubMed was searched for studies predicting response or resistance to chemotherapy using gene expression measurements of human tissue in ovarian cancer. RESULTS: 42 studies were identified and both the data collection and modelling methods were compared. The majority of studies utilised fresh-frozen or formalin-fixed paraffin-embedded tissue. Modelling techniques varied, the most popular being Cox proportional hazards regression and hierarchical clustering which were used by 17 and 11 studies respectively. The gene signatures identified by the various studies were not consistent, with very few genes being identified by more than two studies. Patient cohorts were often noted to be heterogeneous with respect to chemotherapy treatment undergone by patients. CONCLUSIONS: A clinically applicable gene signature capable of predicting patient response to chemotherapy has not yet been identified. Research into a predictive, as opposed to prognostic, model could be highly beneficial and aid the identification of the most suitable treatment for patients.


Assuntos
Antineoplásicos/uso terapêutico , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Neoplasias Ovarianas/tratamento farmacológico , Animais , Antineoplásicos/farmacologia , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Humanos , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/genética , Valor Preditivo dos Testes
10.
Am J Gastroenterol ; 110(4): 588-94, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25823766

RESUMO

OBJECTIVES: A rapid test to diagnose Clostridium difficile infection (CDI) on hospital wards could minimize common but critical diagnostic delay. Field asymmetric ion mobility spectrometry (FAIMS) is a portable mass spectrometry instrument that quickly analyses the chemical composition of gaseous mixtures (e.g., above a stool sample). Can FAIMS accurately distinguish C. difficile-positive from -negative stool samples? METHODS: We analyzed 213 stool samples with FAIMS, of which 71 were C. difficile positive by microbiological analysis. The samples were divided into training, test, and validation samples. We used the training and test samples (n=135) to identify which sample characteristics discriminate between positive and negative samples, and to build machine learning algorithms interpreting these characteristics. The best performing algorithm was then prospectively validated on new, blinded validation samples (n=78). The predicted probability of CDI (as calculated by the algorithm) was compared with the microbiological test results (direct toxin test and culture). RESULTS: Using a Random Forest classification algorithm, FAIMS had a high discriminatory ability on the training and test samples (C-statistic 0.91 (95% confidence interval (CI): 0.86-0.97)). When applied to the blinded validation samples, the C-statistic was 0.86 (0.75-0.97). For samples analyzed ≤7 days of collection (n=76), diagnostic accuracy was even higher (C-statistic: 0.93 (0.85-1.00)). A cutoff value of 0.32 for predicted probability corresponded with a sensitivity of 92.3% (95% CI: 77.4-98.6%) and specificity of 86.0% (78.3-89.3%). For even fresher samples, discriminatory ability further increased. CONCLUSIONS: FAIMS analysis of unprocessed stool samples can differentiate between C. difficile-positive and -negative samples with high diagnostic accuracy.


Assuntos
Algoritmos , Clostridioides difficile/isolamento & purificação , Enterocolite Pseudomembranosa/diagnóstico , Fezes/microbiologia , Análise Espectral/métodos , Infecções por Clostridium/diagnóstico , Enterocolite Pseudomembranosa/microbiologia , Fezes/química , Humanos , Sistemas Automatizados de Assistência Junto ao Leito , Estudos Prospectivos , Projetos de Pesquisa , Sensibilidade e Especificidade , Análise Espectral/instrumentação
11.
Sci Rep ; 5: 9259, 2015 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-25788417

RESUMO

There is currently no biochemical test for detection of early-stage osteoarthritis (eOA). Tests for early-stage rheumatoid arthritis (eRA) such as rheumatoid factor (RF) and anti-cyclic citrullinated peptide (CCP) antibodies require refinement to improve clinical utility. We developed robust mass spectrometric methods to quantify citrullinated protein (CP) and free hydroxyproline in body fluids. We detected CP in the plasma of healthy subjects and surprisingly found that CP was increased in both patients with eOA and eRA whereas anti-CCP antibodies were predominantly present in eRA. A 4-class diagnostic algorithm combining plasma/serum CP, anti-CCP antibody and hydroxyproline applied to a cohort gave specific and sensitive detection and discrimination of eOA, eRA, other non-RA inflammatory joint diseases and good skeletal health. This provides a first-in-class plasma/serum-based biochemical assay for diagnosis and type discrimination of early-stage arthritis to facilitate improved treatment and patient outcomes, exploiting citrullinated protein and related differential autoimmunity.


Assuntos
Artrite Reumatoide/diagnóstico , Biomarcadores/análise , Doenças Musculoesqueléticas/diagnóstico , Osteoartrite/diagnóstico , Espectrometria de Massas em Tandem , Adulto , Idoso , Algoritmos , Área Sob a Curva , Autoanticorpos/sangue , Cromatografia Líquida de Alta Pressão , Citrulina/química , Citrulina/metabolismo , Diagnóstico Precoce , Feminino , Humanos , Hidroxiprolina/análise , Hidroxiprolina/sangue , Masculino , Pessoa de Meia-Idade , Curva ROC , Sensibilidade e Especificidade
12.
PLoS One ; 8(10): e75748, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24194826

RESUMO

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Modelos Genéticos , Teorema de Bayes , Análise por Conglomerados , Humanos , Funções Verossimilhança , Distribuição Normal
13.
PLoS One ; 8(4): e59795, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23565168

RESUMO

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.


Assuntos
Algoritmos , Teorema de Bayes , Análise por Conglomerados , Biologia Computacional/métodos , Internet , Análise em Microsséries , Modelos Estatísticos , Fatores de Tempo
14.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23047558

RESUMO

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Assuntos
Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Imunoprecipitação da Cromatina , Análise por Conglomerados , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genética , Biologia de Sistemas
15.
PLoS Comput Biol ; 7(10): e1002227, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22028636

RESUMO

Different data types can offer complementary perspectives on the same biological phenomenon. In cancer studies, for example, data on copy number alterations indicate losses and amplifications of genomic regions in tumours, while transcriptomic data point to the impact of genomic and environmental events on the internal wiring of the cell. Fusing different data provides a more comprehensive model of the cancer cell than that offered by any single type. However, biological signals in different patients exhibit diverse degrees of concordance due to cancer heterogeneity and inherent noise in the measurements. This is a particularly important issue in cancer subtype discovery, where personalised strategies to guide therapy are of vital importance. We present a nonparametric Bayesian model for discovering prognostic cancer subtypes by integrating gene expression and copy number variation data. Our model is constructed from a hierarchy of Dirichlet Processes and addresses three key challenges in data fusion: (i) To separate concordant from discordant signals, (ii) to select informative features, (iii) to estimate the number of disease subtypes. Concordance of signals is assessed individually for each patient, giving us an additional level of insight into the underlying disease structure. We exemplify the power of our model in prostate cancer and breast cancer and show that it outperforms competing methods. In the prostate cancer data, we identify an entirely new subtype with extremely poor survival outcome and show how other analyses fail to detect it. In the breast cancer data, we find subtypes with superior prognostic value by using the concordant results. These discoveries were crucially dependent on our model's ability to distinguish concordant and discordant signals within each patient sample, and would otherwise have been missed. We therefore demonstrate the importance of taking a patient-specific approach, using highly-flexible nonparametric Bayesian methods.


Assuntos
Teorema de Bayes , Neoplasias da Mama/mortalidade , Modelos Biológicos , Modelos Estatísticos , Neoplasias da Próstata/mortalidade , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA/genética , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Masculino , Prognóstico , Neoplasias da Próstata/classificação , Neoplasias da Próstata/genética , Transdução de Sinais , Estatísticas não Paramétricas , Análise de Sobrevida
16.
BMC Bioinformatics ; 12: 399, 2011 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-21995452

RESUMO

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.


Assuntos
Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Distribuição Normal , Saccharomyces cerevisiae
17.
Bioinformatics ; 26(12): i158-67, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529901

RESUMO

MOTIVATION: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. RESULTS: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs. AVAILABILITY: If interested in the code for the work presented in this article, please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Fatores de Transcrição/metabolismo , Teorema de Bayes , Sítios de Ligação , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
18.
Semin Cell Dev Biol ; 20(7): 863-8, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19682595

RESUMO

A major challenge in systems biology is the ability to model complex regulatory interactions, such as gene regulatory networks, and a number of computational approaches have been developed over recent years to address this challenge. This paper reviews a number of these approaches, with a focus on probabilistic graphical models and the integration of diverse data sets, such as gene expression and transcription factor binding site location and activity.


Assuntos
Imunoprecipitação da Cromatina/métodos , Expressão Gênica , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Biologia de Sistemas/métodos
19.
BMC Bioinformatics ; 10: 242, 2009 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-19660130

RESUMO

BACKGROUND: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. CONCLUSION: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.


Assuntos
Perfilação da Expressão Gênica/métodos , Design de Software , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...