Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Bioinformatics ; 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38913855

ABSTRACT

MOTIVATIONS: Gene Regulatory Networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS: We address this issue for two regression-based GRN inference models, a weighted Random Forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION: The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.

2.
Elife ; 122024 May 23.
Article in English | MEDLINE | ID: mdl-38780431

ABSTRACT

The elevation of atmospheric CO2 leads to a decline in plant mineral content, which might pose a significant threat to food security in coming decades. Although few genes have been identified for the negative effect of elevated CO2 on plant mineral composition, several studies suggest the existence of genetic factors. Here, we performed a large-scale study to explore genetic diversity of plant ionome responses to elevated CO2, using six hundred Arabidopsis thaliana accessions, representing geographical distributions ranging from worldwide to regional and local environments. We show that growth under elevated CO2 leads to a global decrease of ionome content, whatever the geographic distribution of the population. We observed a high range of genetic diversity, ranging from the most negative effect to resilience or even to a benefit in response to elevated CO2. Using genome-wide association mapping, we identified a large set of genes associated with this response, and we demonstrated that the function of one of these genes is involved in the negative effect of elevated CO2 on plant mineral composition. This resource will contribute to understand the mechanisms underlying the effect of elevated CO2 on plant mineral nutrition, and could help towards the development of crops adapted to a high-CO2 world.


Subject(s)
Arabidopsis , Carbon Dioxide , Genetic Variation , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis/drug effects , Carbon Dioxide/metabolism , Genome-Wide Association Study
3.
New Phytol ; 239(3): 992-1004, 2023 08.
Article in English | MEDLINE | ID: mdl-36727308

ABSTRACT

The elevation of CO2 in the atmosphere increases plant biomass but decreases their mineral content. The genetic and molecular bases of these effects remain mostly unknown, in particular in the root system, which is responsible for plant nutrient uptake. To gain knowledge about the effect of elevated CO2 on plant growth and physiology, and to identify its regulatory in the roots, we analyzed genome expression in Arabidopsis roots through a combinatorial design with contrasted levels of CO2 , nitrate, and iron. We demonstrated that elevated CO2 has a modest effect on root genome expression under nutrient sufficiency, but by contrast leads to massive expression changes under nitrate or iron deficiencies. We demonstrated that elevated CO2 negatively targets nitrate and iron starvation modules at the transcriptional level, associated with a reduction in high-affinity nitrate uptake. Finally, we inferred a gene regulatory network governing the root response to elevated CO2 . This network allowed us to identify candidate transcription factors including MYB15, WOX11, and EDF3 which we experimentally validated for their role in the stimulation of growth by elevated CO2 . Our approach identified key features and regulators of the plant response to elevated CO2 , with the objective of developing crops resilient to climate change.


Subject(s)
Arabidopsis , Arabidopsis/metabolism , Carbon Dioxide/metabolism , Nitrates/pharmacology , Nitrates/metabolism , Gene Regulatory Networks , Plants/metabolism , Iron/metabolism , Plant Roots/metabolism
4.
BMC Genomics ; 22(1): 387, 2021 May 26.
Article in English | MEDLINE | ID: mdl-34039282

ABSTRACT

BACKGROUND: High-throughput transcriptomic datasets are often examined to discover new actors and regulators of a biological response. To this end, graphical interfaces have been developed and allow a broad range of users to conduct standard analyses from RNA-seq data, even with little programming experience. Although existing solutions usually provide adequate procedures for normalization, exploration or differential expression, more advanced features, such as gene clustering or regulatory network inference, often miss or do not reflect current state of the art methodologies. RESULTS: We developed here a user interface called DIANE (Dashboard for the Inference and Analysis of Networks from Expression data) designed to harness the potential of multi-factorial expression datasets from any organisms through a precise set of methods. DIANE interactive workflow provides normalization, dimensionality reduction, differential expression and ontology enrichment. Gene clustering can be performed and explored via configurable Mixture Models, and Random Forests are used to infer gene regulatory networks. DIANE also includes a novel procedure to assess the statistical significance of regulator-target influence measures based on permutations for Random Forest importance metrics. All along the pipeline, session reports and results can be downloaded to ensure clear and reproducible analyses. CONCLUSIONS: We demonstrate the value and the benefits of DIANE using a recently published data set describing the transcriptional response of Arabidopsis thaliana under the combination of temperature, drought and salinity perturbations. We show that DIANE can intuitively carry out informative exploration and statistical procedures with RNA-Seq data, perform model based gene expression profiles clustering and go further into gene network reconstruction, providing relevant candidate genes or signalling pathways to explore. DIANE is available as a web service ( https://diane.bpmp.inrae.fr ), or can be installed and locally launched as a complete R package.


Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Cluster Analysis , Computational Biology , Software , Transcriptome
5.
PLoS Comput Biol ; 17(4): e1008909, 2021 04.
Article in English | MEDLINE | ID: mdl-33861755

ABSTRACT

Long regulatory elements (LREs), such as CpG islands, polydA:dT tracts or AU-rich elements, are thought to play key roles in gene regulation but, as opposed to conventional binding sites of transcription factors, few methods have been proposed to formally and automatically characterize them. We present here a computational approach named DExTER (Domain Exploration To Explain gene Regulation) dedicated to the identification of candidate LREs (cLREs) and apply it to the analysis of the genomes of P. falciparum and other eukaryotes. Our analyses show that all tested genomes contain several cLREs that are somewhat conserved along evolution, and that gene expression can be predicted with surprising accuracy on the basis of these long regions only. Regulation by cLREs exhibits very different behaviours depending on species and conditions. In P. falciparum and other Apicomplexan organisms as well as in Dictyostelium discoideum, the process appears highly dynamic, with different cLREs involved at different phases of the life cycle. For multicellular organisms, the same cLREs are involved in all tissues, but a dynamic behavior is observed along embryonic development stages. In P. falciparum, whose genome is known to be strongly depleted of transcription factors, cLREs are predictive of expression with an accuracy above 70%, and our analyses show that they are associated with both transcriptional and post-transcriptional regulation signals. Moreover, we assessed the biological relevance of one LRE discovered by DExTER in P. falciparum using an in vivo reporter assay. The source code (python) of DExTER is available at https://gite.lirmm.fr/menichelli/DExTER.


Subject(s)
Genome, Protozoan , Plasmodium falciparum/genetics , Regulatory Sequences, Nucleic Acid , Eukaryota/genetics , Gene Expression Regulation , Gene Ontology , Genes, Reporter , Histones/metabolism , RNA Processing, Post-Transcriptional , RNA, Antisense/genetics , RNA, Messenger/genetics , Transcription, Genetic
6.
Nucleic Acids Res ; 49(5): 2488-2508, 2021 03 18.
Article in English | MEDLINE | ID: mdl-33533919

ABSTRACT

The ubiquitous family of dimeric transcription factors AP-1 is made up of Fos and Jun family proteins. It has long been thought to operate principally at gene promoters and how it controls transcription is still ill-understood. The Fos family protein Fra-1 is overexpressed in triple negative breast cancers (TNBCs) where it contributes to tumor aggressiveness. To address its transcriptional actions in TNBCs, we combined transcriptomics, ChIP-seqs, machine learning and NG Capture-C. Additionally, we studied its Fos family kin Fra-2 also expressed in TNBCs, albeit much less. Consistently with their pleiotropic effects, Fra-1 and Fra-2 up- and downregulate individually, together or redundantly many genes associated with a wide range of biological processes. Target gene regulation is principally due to binding of Fra-1 and Fra-2 at regulatory elements located distantly from cognate promoters where Fra-1 modulates the recruitment of the transcriptional co-regulator p300/CBP and where differences in AP-1 variant motif recognition can underlie preferential Fra-1- or Fra-2 bindings. Our work also shows no major role for Fra-1 in chromatin architecture control at target gene loci, but suggests collaboration between Fra-1-bound and -unbound enhancers within chromatin hubs sometimes including promoters for other Fra-1-regulated genes. Our work impacts our view of AP-1.


Subject(s)
Enhancer Elements, Genetic , Gene Expression Regulation, Neoplastic , Proto-Oncogene Proteins c-fos/metabolism , Triple Negative Breast Neoplasms/genetics , Binding Sites , Cell Line, Tumor , Chromatin/chemistry , Chromatin/metabolism , Epigenesis, Genetic , Fos-Related Antigen-2/metabolism , Humans , Nucleotide Motifs , Promoter Regions, Genetic , Proto-Oncogene Proteins c-fos/physiology , Transcription Factor AP-1/metabolism , Triple Negative Breast Neoplasms/metabolism , p300-CBP Transcription Factors/metabolism
7.
BMC Genomics ; 20(1): 103, 2019 Feb 01.
Article in English | MEDLINE | ID: mdl-30709337

ABSTRACT

BACKGROUND: In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combinations among different gene classes, regulatory regions and cell types. RESULTS: We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a target TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the nucleotide content of the sequences and of the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combinations involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for predicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in promoters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the different targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately distinguish promoters associated with specific biological processes. CONCLUSIONS: TFcoop appears as an accurate approach for studying TF combinations. Its use on ENCODE and FANTOM data allowed us to discover important properties of human TF combinations in different promoter classes and in enhancers. The R code for learning a TFcoop model and for reproducing the main experiments described in the paper is available in an R Markdown file at address https://gite.lirmm.fr/brehelin/TFcoop .


Subject(s)
Computational Biology/methods , Enhancer Elements, Genetic , Gene Expression Regulation , Promoter Regions, Genetic , Transcription Factors/metabolism , Binding Sites , Humans , Transcription Factors/genetics
8.
PLoS Comput Biol ; 14(1): e1005921, 2018 01.
Article in English | MEDLINE | ID: mdl-29293496

ABSTRACT

Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.


Subject(s)
Base Composition , Gene Expression Regulation , Regulatory Sequences, Nucleic Acid , Computational Biology , DNA Copy Number Variations , Enhancer Elements, Genetic , Genome, Human , Humans , Models, Genetic , Neoplasms/genetics , Neoplasms/metabolism , Polymorphism, Single Nucleotide , Promoter Regions, Genetic , Quantitative Trait Loci , RNA, Messenger/chemistry , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
9.
J Theor Biol ; 415: 90-101, 2017 02 21.
Article in English | MEDLINE | ID: mdl-27737786

ABSTRACT

Overlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory. Our focus in this paper is on the constraints induced on two overlapping genes in terms of amino acids, as well as polypeptides. We show that simple linear constraints bind the amino-acid composition of two proteins encoded by overlapping genes. Novel constraints are revealed when polypeptides are considered, and not just single amino acids. For example, in double-coding sequences with an overlapping reading frame -2, each Tyrosine (denoted as Tyr or Y) in the overlapping frame overlaps a Tyrosine in the reference frame +0 (and reciprocally), whereas specific words (e.g. YY) never occur. We thus distinguish between null constraints (YY = 0 in frame -2) and non-null constraints (Y in frame +0 ⇔ Y in frame -2). Our equivalence-based constraints are symmetrical and thus enable the characterization of the joint composition of overlapping proteins. We describe several formal frameworks and a graph algorithm to characterize and compute these constraints. As expected, the degrees of freedom left by these constraints vary drastically among the different overlapping frames. Interestingly, the biological meaning of constraints induced on two overlapping proteins (hydropathy, forbidden di-peptides, expected overlap length …) is also specific to the reading frame. We study the combinatorics of these constraints for overlapping polypeptides of length n, pointing out that, (i) except for frame -2, non-null constraints are deduced from the amino-acid (length = 1) constraints and (ii) null constraints are deduced from the di-peptide (length = 2) constraints. These results yield support for understanding the mechanisms and evolution of overlapping genes, and for developing novel overlapping gene detection methods.


Subject(s)
Amino Acid Sequence/genetics , Genes, Overlapping , Open Reading Frames , Proteins/genetics , Algorithms , Animals , Biological Evolution , Humans
10.
Biosystems ; 135: 15-34, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26135206

ABSTRACT

We propose here the GETEC (Genome Evolution by Transformation, Expansion and Contraction) model of gene evolution based on substitution, insertion and deletion of genetic motifs. The GETEC model unifies two classes of evolution models: models of substitution, insertion and deletion of nucleotides as function of time (Lèbre and Michel, 2010) and sequence length (Lèbre and Michel, 2012), and models of symmetric substitution of genetic motifs as function of time (Benard and Michel, 2011). Evolution of genetic motifs based on substitution, insertion and deletion is modeled by a differential equation whose analytical solutions give an expression of the genetic motif occurrence probabilities as a function of time or sequence length, as well as in direct time direction (past-present) or inverse time direction (present-past). Evolution models with "substitution only", i.e. without insertion and deletion, and with "insertion and deletion only", i.e. without substitution, are particular cases of the GETEC model. We have also developed a research software for computing the analytical solutions of the GETEC model. It is freely accessible at http://icube-bioinfo.u-strasbg.fr/webMathematica/GETEC/ or via the web site http://dpt-info.u-strasbg.fr/∼michel/.


Subject(s)
Evolution, Molecular , Models, Genetic , Mutation , Computational Biology , Humans , Software
11.
Math Biosci ; 245(2): 137-47, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23770433

ABSTRACT

We recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax. This new model, called LIIS for Limited Insertion Independent of Substitution, defines a matrix differential equation satisfied by a vector P(t) describing the sequence content in each residue at evolution time t. An analytical solution is obtained for any diagonalizable substitution matrix M. Thus, the LIIS model gives an expression of the sequence content vector P(t) in each residue under evolution time t as a function of the eigenvalues and the eigenvectors of matrix M, the residue insertion rate vector R, the total insertion rate r, the initial and maximum sequence lengths n0 and nmax, respectively, and the sequence content vector P(t0) at initial time t0. The derivation of the analytical solution is much more technical, compared to the IDIS model, as it involves Gauss hypergeometric functions. Several propositions of the LIIS model are derived: proof that the IDIS model is a particular case of the LIIS model when the maximum sequence length nmax tends to infinity, fixed point, time scale, time step and time inversion. Using a relation between the sequence length l and the evolution time t, an expression of the LIIS model as a function of the sequence length l=n(t) is obtained. Formulas for 'insertion only', i.e. when the substitution rates are all equal to 0, are derived at evolution time t and sequence length l. Analytical solutions of the LIIS model are explicitly derived, as a function of either evolution time t or sequence length l, for two classical substitution matrices: the 3-parameter symmetric substitution matrix [12] (LIIS-SYM3) and the HKY asymmetric substitution matrix[9] (LIIS-HKY). An evaluation of the LIIS model (precisely, LIIS-HKY) based on four statistical analyses of the GC content in complete genomes of four prokaryotic taxonomic groups, namely Chlamydiae, Crenarchaeota, Spirochaetes and Thermotogae, shows the expected improvement from the theory of the LIIS model compared to the IDIS model.


Subject(s)
Evolution, Molecular , Models, Genetic , Base Composition , Chlamydiaceae/genetics , Computational Biology , Crenarchaeota/genetics , Gram-Negative Anaerobic Straight, Curved, and Helical Rods/genetics , Mathematical Concepts , Mutagenesis, Insertional , Sequence Deletion , Spirochaetales/genetics , Stochastic Processes , Time Factors
12.
Bull Math Biol ; 74(8): 1764-88, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22644340

ABSTRACT

We introduce here a gene evolution model which is an extension of the time-continuous stochastic IDIS model (Lèbre and Michel in J. Comput. Biol. Chem. 34:259-267, 2010) to sequence length. This new IDISL (Insertion Deletion Independent of Substitution based on sequence Length) model gives an analytical expression of the residue occurrence probability p(l) at sequence length l depending on stochastically independent processes of substitution, insertion, and deletion. Furthermore, in contrast to all mathematical models in this research field, the substitution, insertion, and deletion parameters of the IDISL model are independent of each other. For any diagonalizable substitution matrix M, the residue occurrence probability p(l) is given as a function of the eigenvalues of M, the eigenvector matrix of M, a vector r of the residue insertion rates, a deletion rate d (unlike our previous IDIS model), and a vector of the initial residue occurrence probability p(l(0)) at sequence length l(0).As another difference with the classical evolution approaches which mainly focus on sequence alignment, the IDIS class of models allows a mathematical analysis of the behavior of the residue occurrence probability according to either evolution time or sequence length. The length parameter can be associated with any nucleotide regions: genes, genomes, introns, repeats, 5' and 3' regions, etc. Three properties of the IDISL model are given in relation with the sequence length l: parameter scale, inverse evolution, and residue equilibrium distribution. Nucleotide occurrence probabilities are given in the particular case of the IDISL-HKY model, i.e. the IDISL model associated with the HKY asymmetric substitution matrix (Hasegawa et al. in J. Mol. Evol. 22:160-174, 1985).An application of the IDISL model is developed for a massive statistical analysis of GC content in all complete bacterial genomes available to date (894 non-anaerobic and anaerobic genomes). The IDISL-HKY model confirms the increase of the GC content with the genome length for two non-anaerobic taxonomic groups of bacterial genomes. Moreover, the non-linear modelling proposed by the IDISL model outperforms the most recent modelling of GC content in these bacterial genomes (Wang et al. in Biochem. Biophys. Res. Commun. 342:681-684, 2006; Musto et al. in Biochem. Biophys. Res. Commun. 347:1-3, 2006).


Subject(s)
Base Composition , Evolution, Molecular , Genome, Bacterial , Models, Genetic , Sequence Deletion , Computer Simulation , Nucleotides/genetics
13.
Methods Mol Biol ; 802: 199-213, 2012.
Article in English | MEDLINE | ID: mdl-22130882

ABSTRACT

Dynamic Bayesian networks (DBNs) have received increasing attention from the computational biology community as models of gene regulatory networks. However, conventional DBNs are based on the homogeneous Markov assumption and cannot deal with inhomogeneity and nonstationarity in temporal processes. The present chapter provides a detailed discussion of how the homogeneity assumption can be relaxed. The improved method is evaluated on simulated data, where the network structure is allowed to change with time, and on gene expression time series during morphogenesis in Drosophila melanogaster.


Subject(s)
Computational Biology/methods , Systems Biology/methods , Animals , Bayes Theorem , Computer Simulation , Drosophila melanogaster/genetics , Gene Expression Profiling/methods , Gene Regulatory Networks , Models, Statistical , Morphogenesis/genetics
14.
Comput Biol Chem ; 34(5-6): 259-67, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20952258

ABSTRACT

We develop here a new class of stochastic models of gene evolution based on residue Insertion-Deletion Independent from Substitution (IDIS). Indeed, in contrast to all existing evolution models, insertions and deletions are modeled here by a concept in population dynamics. Therefore, they are not only independent from each other, but also independent from the substitution process. After a separate stochastic analysis of the substitution and the insertion-deletion processes, we obtain a matrix differential equation combining these two processes defining the IDIS model. By deriving a general solution, we give an analytical expression of the residue occurrence probability at evolution time t as a function of a substitution rate matrix, an insertion rate vector, a deletion rate and an initial residue probability vector. Various mathematical properties of the IDIS model in relation with time t are derived: time scale, time step, time inversion and sequence length. Particular expressions of the nucleotide occurrence probability at time t are given for classical substitution rate matrices in various biological contexts: equal insertion rate, insertion-deletion only and substitution only. All these expressions can be directly used for biological evolutionary applications. The IDIS model shows a strongly different stochastic behavior from the classical substitution only model when compared on a gene dataset. Indeed, by considering three processes of residue insertion, deletion and substitution independently from each other, it allows a more realistic representation of gene evolution and opens new directions and applications in this research field.


Subject(s)
Evolution, Molecular , INDEL Mutation , Models, Genetic , Models, Statistical , Computer Simulation , Nucleotides/genetics , Nucleotides/metabolism , Probability , Stochastic Processes
15.
BMC Syst Biol ; 4: 130, 2010 Sep 22.
Article in English | MEDLINE | ID: mdl-20860793

ABSTRACT

BACKGROUND: Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. METHODS: To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). RESULTS: We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. CONCLUSIONS: ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Algorithms , Animals , Benomyl/poisoning , Drosophila melanogaster/genetics , Drosophila melanogaster/growth & development , Markov Chains , Monte Carlo Method , Regression Analysis , Reproducibility of Results , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae/genetics , Transcription Factors/metabolism
16.
Am J Ind Med ; 52(12): 916-25, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19937949

ABSTRACT

BACKGROUND: Nuclear workers from French contracting companies have received higher doses than workers from Electricité de France (EDF) or Commissariat à l'Energie Atomique (CEA). METHODS: A cohort study of 9,815 workers in 11 contracting companies, monitored for exposure to ionizing radiation between 1967 and 2000 were followed up for a median duration of 12.5 years. Standardized mortality ratios (SMRs) were computed. RESULTS: Between 1968 and 2002, 250 deaths occurred. Our study demonstrated a clear healthy worker effect (HWE) with mortality attaining half that expected from national mortality statistics (SMR = 0.54, 95% CI = [0.47-0.61]). The HWE was lower for all cancers (SMR = 0.65) than for non-cancer deaths (SMR = 0.46). The analysis by cancer site showed no excess compared with the general population. Significant trends were observed according to the level of exposure to ionizing radiation for deaths from cancer, deaths from digestive cancer and deaths from respiratory cancer. CONCLUSIONS: The mortality of nuclear workers from contracting companies is very low compared to French national mortality.


Subject(s)
Contract Services , Neoplasms, Radiation-Induced/mortality , Nuclear Power Plants , Occupational Diseases/mortality , Adolescent , Adult , Aged , Aged, 80 and over , Cause of Death , Cohort Studies , Female , France , Healthy Worker Effect , Humans , Male , Middle Aged , Photons , Radiometry , Reference Values , Retrospective Studies , Survival Rate , Young Adult
17.
Stat Appl Genet Mol Biol ; 8: Article 9, 2009.
Article in English | MEDLINE | ID: mdl-19222392

ABSTRACT

In this paper, we introduce a novel inference method for dynamic genetic networks which makes it possible to face a number of time measurements n that is much smaller than the number of genes p. The approach is based on the concept of a low order conditional dependence graph that we extend here in the case of dynamic Bayesian networks. Most of our results are based on the theory of graphical models associated with the directed acyclic graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given in the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-Bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the R archive (CRAN).


Subject(s)
Gene Regulatory Networks , Models, Genetic , Algorithms , Bayes Theorem , Computer Simulation , Oligonucleotide Array Sequence Analysis , Reproducibility of Results , Saccharomyces cerevisiae , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...