Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Plant J ; 107(5): 1363-1386, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34160110

RESUMO

The photosynthetic capacity of mature leaves increases after several days' exposure to constant or intermittent episodes of high light (HL) and is manifested primarily as changes in chloroplast physiology. How this chloroplast-level acclimation to HL is initiated and controlled is unknown. From expanded Arabidopsis leaves, we determined HL-dependent changes in transcript abundance of 3844 genes in a 0-6 h time-series transcriptomics experiment. It was hypothesized that among such genes were those that contribute to the initiation of HL acclimation. By focusing on differentially expressed transcription (co-)factor genes and applying dynamic statistical modelling to the temporal transcriptomics data, a regulatory network of 47 predominantly photoreceptor-regulated transcription (co-)factor genes was inferred. The most connected gene in this network was B-BOX DOMAIN CONTAINING PROTEIN32 (BBX32). Plants overexpressing BBX32 were strongly impaired in acclimation to HL and displayed perturbed expression of photosynthesis-associated genes under LL and after exposure to HL. These observations led to demonstrating that as well as regulation of chloroplast-level acclimation by BBX32, CRYPTOCHROME1, LONG HYPOCOTYL5, CONSTITUTIVELY PHOTOMORPHOGENIC1 and SUPPRESSOR OF PHYA-105 are important. In addition, the BBX32-centric gene regulatory network provides a view of the transcriptional control of acclimation in mature leaves distinct from other photoreceptor-regulated processes, such as seedling photomorphogenesis.


Assuntos
Aclimatação/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Proteínas de Transporte/metabolismo , Regulação da Expressão Gênica de Plantas , Transcriptoma , Aclimatação/efeitos da radiação , Arabidopsis/fisiologia , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/genética , Teorema de Bayes , Proteínas de Transporte/genética , Cloroplastos/efeitos da radiação , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Luz , Fotossíntese/efeitos da radiação , Folhas de Planta/genética , Folhas de Planta/fisiologia , Folhas de Planta/efeitos da radiação
2.
Methods Mol Biol ; 1883: 251-282, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547404

RESUMO

Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS .


Assuntos
Conjuntos de Dados como Assunto , Redes Reguladoras de Genes , Modelos Genéticos , Biologia de Sistemas/métodos , Algoritmos , Teorema de Bayes , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise Espaço-Temporal , Biologia de Sistemas/instrumentação
3.
Bioinformatics ; 34(5): 884-886, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29126246

RESUMO

Summary: Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation: The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact: info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Software , Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
4.
PLoS One ; 12(2): e0169356, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28166227

RESUMO

Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Conjuntos de Dados como Assunto , Evolução Molecular , Modelos Moleculares , Mutação , Ligação Proteica , Conformação Proteica , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Alinhamento de Sequência/métodos
5.
Plant Cell ; 28(2): 345-66, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26842464

RESUMO

In Arabidopsis thaliana, changes in metabolism and gene expression drive increased drought tolerance and initiate diverse drought avoidance and escape responses. To address regulatory processes that link these responses, we set out to identify genes that govern early responses to drought. To do this, a high-resolution time series transcriptomics data set was produced, coupled with detailed physiological and metabolic analyses of plants subjected to a slow transition from well-watered to drought conditions. A total of 1815 drought-responsive differentially expressed genes were identified. The early changes in gene expression coincided with a drop in carbon assimilation, and only in the late stages with an increase in foliar abscisic acid content. To identify gene regulatory networks (GRNs) mediating the transition between the early and late stages of drought, we used Bayesian network modeling of differentially expressed transcription factor (TF) genes. This approach identified AGAMOUS-LIKE22 (AGL22), as key hub gene in a TF GRN. It has previously been shown that AGL22 is involved in the transition from vegetative state to flowering but here we show that AGL22 expression influences steady state photosynthetic rates and lifetime water use. This suggests that AGL22 uniquely regulates a transcriptional network during drought stress, linking changes in primary metabolism and the initiation of stress responses.


Assuntos
Ácido Abscísico/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Reguladores de Crescimento de Plantas/metabolismo , Fatores de Transcrição/metabolismo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Secas , Redes Reguladoras de Genes , Mutação , Fenótipo , Fotossíntese/fisiologia , Estresse Fisiológico , Fatores de Transcrição/genética
6.
Stat Appl Genet Mol Biol ; 15(1): 83-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26910751

RESUMO

The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct--but often complementary--information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Análise por Conglomerados , Cadeias de Markov , Método de Monte Carlo , Software , Biologia de Sistemas/métodos
7.
Plant Cell ; 27(11): 3038-64, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26566919

RESUMO

Transcriptional reprogramming is integral to effective plant defense. Pathogen effectors act transcriptionally and posttranscriptionally to suppress defense responses. A major challenge to understanding disease and defense responses is discriminating between transcriptional reprogramming associated with microbial-associated molecular pattern (MAMP)-triggered immunity (MTI) and that orchestrated by effectors. A high-resolution time course of genome-wide expression changes following challenge with Pseudomonas syringae pv tomato DC3000 and the nonpathogenic mutant strain DC3000hrpA- allowed us to establish causal links between the activities of pathogen effectors and suppression of MTI and infer with high confidence a range of processes specifically targeted by effectors. Analysis of this information-rich data set with a range of computational tools provided insights into the earliest transcriptional events triggered by effector delivery, regulatory mechanisms recruited, and biological processes targeted. We show that the majority of genes contributing to disease or defense are induced within 6 h postinfection, significantly before pathogen multiplication. Suppression of chloroplast-associated genes is a rapid MAMP-triggered defense response, and suppression of genes involved in chromatin assembly and induction of ubiquitin-related genes coincide with pathogen-induced abscisic acid accumulation. Specific combinations of promoter motifs are engaged in fine-tuning the MTI response and active transcriptional suppression at specific promoter configurations by P. syringae.


Assuntos
Arabidopsis/imunologia , Terapia de Imunossupressão , Moléculas com Motivos Associados a Patógenos/metabolismo , Imunidade Vegetal/genética , Folhas de Planta/imunologia , Pseudomonas syringae/fisiologia , Transcrição Gênica , Arabidopsis/genética , Arabidopsis/microbiologia , Sequência de Bases , Cromatina/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Genes de Plantas , Dados de Sequência Molecular , Motivos de Nucleotídeos/genética , Doenças das Plantas/genética , Doenças das Plantas/imunologia , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Pseudomonas syringae/crescimento & desenvolvimento , Fatores de Transcrição/metabolismo
8.
Stat Appl Genet Mol Biol ; 14(3): 307-10, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26030796

RESUMO

Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets.


Assuntos
Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Regulação da Expressão Gênica , Redes Reguladoras de Genes
9.
Bioinformatics ; 31(12): i97-105, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072515

RESUMO

MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Simulação por Computador , Modelos Genéticos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Software
10.
PLoS One ; 8(4): e59795, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23565168

RESUMO

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.


Assuntos
Algoritmos , Teorema de Bayes , Análise por Conglomerados , Biologia Computacional/métodos , Internet , Análise em Microsséries , Modelos Estatísticos , Fatores de Tempo
11.
Plant J ; 75(1): 26-39, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23578292

RESUMO

A model is presented describing the gene regulatory network surrounding three similar NAC transcription factors that have roles in Arabidopsis leaf senescence and stress responses. ANAC019, ANAC055 and ANAC072 belong to the same clade of NAC domain genes and have overlapping expression patterns. A combination of promoter DNA/protein interactions identified using yeast 1-hybrid analysis and modelling using gene expression time course data has been applied to predict the regulatory network upstream of these genes. Similarities and divergence in regulation during a variety of stress responses are predicted by different combinations of upstream transcription factors binding and also by the modelling. Mutant analysis with potential upstream genes was used to test and confirm some of the predicted interactions. Gene expression analysis in mutants of ANAC019 and ANAC055 at different times during leaf senescence has revealed a distinctly different role for each of these genes. Yeast 1-hybrid analysis is shown to be a valuable tool that can distinguish clades of binding proteins and be used to test and quantify protein binding to predicted promoter motifs.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas , Estresse Fisiológico , Arabidopsis/fisiologia , Proteínas de Arabidopsis/metabolismo , Senescência Celular , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/fisiologia , Plantas Geneticamente Modificadas , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Técnicas do Sistema de Duplo-Híbrido
12.
Bioinformatics ; 29(5): 580-7, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23314126

RESUMO

MOTIVATION: The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in ß-sheets. The strong constraints and prior knowledge associated with ß-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM. RESULTS: Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for ß-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input. AVAILABILITY: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/


Assuntos
Inteligência Artificial , Modelos Estatísticos , Estrutura Secundária de Proteína , Entropia , Modelos Moleculares , Mutação , Redes Neurais de Computação , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína
13.
J Chem Theory Comput ; 9(12): 5718-5733, 2013 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-24683370

RESUMO

Maximum Likelihood (ML) optimization schemes are widely used for parameter inference. They maximize the likelihood of some experimentally observed data, with respect to the model parameters iteratively, following the gradient of the logarithm of the likelihood. Here, we employ a ML inference scheme to infer a generalizable, physics-based coarse-grained protein model (which includes Go̅-like biasing terms to stabilize secondary structure elements in room-temperature simulations), using native conformations of a training set of proteins as the observed data. Contrastive divergence, a novel statistical machine learning technique, is used to efficiently approximate the direction of the gradient ascent, which enables the use of a large training set of proteins. Unlike previous work, the generalizability of the protein model allows the folding of peptides and a protein (protein G) which are not part of the training set. We compare the same force field with different van der Waals (vdW) potential forms: a hard cutoff model, and a Lennard-Jones (LJ) potential with vdW parameters inferred or adopted from the CHARMM or AMBER force fields. Simulations of peptides and protein G show that the LJ model with inferred parameters outperforms the hard cutoff potential, which is consistent with previous observations. Simulations using the LJ potential with inferred vdW parameters also outperforms the protein models with adopted vdW parameter values, demonstrating that model parameters generally cannot be used with force fields with different energy functions. The software is available at https://sites.google.com/site/crankite/.

14.
Plant Cell ; 24(9): 3530-57, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23023172

RESUMO

Transcriptional reprogramming forms a major part of a plant's response to pathogen infection. Many individual components and pathways operating during plant defense have been identified, but our knowledge of how these different components interact is still rudimentary. We generated a high-resolution time series of gene expression profiles from a single Arabidopsis thaliana leaf during infection by the necrotrophic fungal pathogen Botrytis cinerea. Approximately one-third of the Arabidopsis genome is differentially expressed during the first 48 h after infection, with the majority of changes in gene expression occurring before significant lesion development. We used computational tools to obtain a detailed chronology of the defense response against B. cinerea, highlighting the times at which signaling and metabolic processes change, and identify transcription factor families operating at different times after infection. Motif enrichment and network inference predicted regulatory interactions, and testing of one such prediction identified a role for TGA3 in defense against necrotrophic pathogens. These data provide an unprecedented level of detail about transcriptional changes during a defense response and are suited to systems biology analyses to generate predictive models of the gene regulatory networks mediating the Arabidopsis response to B. cinerea.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Doenças das Plantas/imunologia , Arabidopsis/imunologia , Arabidopsis/metabolismo , Arabidopsis/microbiologia , Botrytis/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Mutação , Motivos de Nucleotídeos , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Imunidade Vegetal , Folhas de Planta/genética , Folhas de Planta/metabolismo , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Transdução de Sinais , Fatores de Tempo , Fatores de Transcrição/genética , Transcriptoma
15.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23047558

RESUMO

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Assuntos
Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Imunoprecipitação da Cromatina , Análise por Conglomerados , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genética , Biologia de Sistemas
16.
Bioinformatics ; 28(12): i233-41, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22689766

RESUMO

MOTIVATION: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. RESULTS: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. AVAILABILITY: The methods outlined in this article have been implemented in Matlab and are available on request.


Assuntos
Teorema de Bayes , Redes Reguladoras de Genes , Estatísticas não Paramétricas , Algoritmos , Arabidopsis/genética , Regulação da Expressão Gênica , Modelos Teóricos , Fatores de Transcrição/genética , Técnicas do Sistema de Duplo-Híbrido
17.
Biophys J ; 102(4): 878-86, 2012 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-22385859

RESUMO

Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Go-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas de Bactérias/química , Teorema de Bayes , Peptídeos/química , Estrutura Secundária de Proteína , Termodinâmica
18.
BMC Bioinformatics ; 12: 399, 2011 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-21995452

RESUMO

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.


Assuntos
Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Distribuição Normal , Saccharomyces cerevisiae
19.
Plant Cell ; 23(3): 873-94, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21447789

RESUMO

Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a high-resolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence.


Assuntos
Proteínas de Arabidopsis/análise , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Folhas de Planta/metabolismo , Análise de Variância , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Clorofila/análise , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise em Microsséries/métodos , Modelos Biológicos , Família Multigênica , Reguladores de Crescimento de Plantas/análise , Folhas de Planta/genética , Folhas de Planta/crescimento & desenvolvimento , Regiões Promotoras Genéticas , RNA de Plantas/genética , Fatores de Transcrição/metabolismo
20.
Interface Focus ; 1(6): 857-70, 2011 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-23226586

RESUMO

Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...