Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 51(2): 712-727, 2023 01 25.
Article in English | MEDLINE | ID: mdl-36537210

ABSTRACT

Various genetic diseases associated with microcephaly and developmental defects are due to pathogenic variants in the U4atac small nuclear RNA (snRNA), a component of the minor spliceosome essential for the removal of U12-type introns from eukaryotic mRNAs. While it has been shown that a few RNU4ATAC mutations result in impaired binding of essential protein components, the molecular defects of the vast majority of variants are still unknown. Here, we used lymphoblastoid cells derived from RNU4ATAC compound heterozygous (g.108_126del;g.111G>A) twin patients with MOPD1 phenotypes to analyze the molecular consequences of the mutations on small nuclear ribonucleoproteins (snRNPs) formation and on splicing. We found that the U4atac108_126del mutant is unstable and that the U4atac111G>A mutant as well as the minor di- and tri-snRNPs are present at reduced levels. Our results also reveal the existence of 3'-extended snRNA transcripts in patients' cells. Moreover, we show that the mutant cells have alterations in splicing of INTS7 and INTS10 minor introns, contain lower levels of the INTS7 and INTS10 proteins and display changes in the assembly of Integrator subunits. Altogether, our results show that compound heterozygous g.108_126del;g.111G>A mutations induce splicing defects and affect the homeostasis and function of the Integrator complex.


Subject(s)
Ribonucleoproteins, Small Nuclear , Spliceosomes , Spliceosomes/genetics , Spliceosomes/metabolism , Ribonucleoproteins, Small Nuclear/genetics , Mutation , Introns/genetics , RNA Splicing/genetics , RNA, Small Nuclear/metabolism , Homeostasis/genetics
3.
PLoS One ; 15(7): e0235655, 2020.
Article in English | MEDLINE | ID: mdl-32628740

ABSTRACT

Biallelic variants in RNU4ATAC, a non-coding gene transcribed into the minor spliceosome component U4atac snRNA, are responsible for three rare recessive developmental diseases, namely Taybi-Linder/MOPD1, Roifman and Lowry-Wood syndromes. Next-generation sequencing of clinically heterogeneous cohorts (children with either a suspected genetic disorder or a congenital microcephaly) recently identified mutations in this gene, illustrating how profoundly these technologies are modifying genetic testing and assessment. As RNU4ATAC has a single non-coding exon, the bioinformatic prediction algorithms assessing the effect of sequence variants on splicing or protein function are irrelevant, which makes variant interpretation challenging to molecular diagnostic laboratories. In order to facilitate and improve clinical diagnostic assessment and genetic counseling, we present i) an update of the previously reported RNU4ATAC mutations and an analysis of the genetic variations affecting this gene using the Genome Aggregation Database (gnomAD) resource; ii) the pathogenicity prediction performances of scores computed based on an RNA structure prediction tool and of those produced by the Combined Annotation Dependent Depletion tool for the 285 RNU4ATAC variants identified in patients or in large-scale sequencing projects; iii) a method, based on a cellular assay, that allows to measure the effect of RNU4ATAC variants on splicing efficiency of a minor (U12-type) reporter intron. Lastly, the concordance of bioinformatic predictions and cellular assay results was investigated.


Subject(s)
RNA, Small Nuclear/metabolism , Spliceosomes/metabolism , Child , Databases, Genetic , Dwarfism/genetics , Dwarfism/pathology , Fetal Growth Retardation/genetics , Fetal Growth Retardation/pathology , Fibroblasts/cytology , Fibroblasts/metabolism , Genetic Variation , Humans , Microcephaly/genetics , Microcephaly/pathology , Nucleic Acid Conformation , Osteochondrodysplasias/genetics , Osteochondrodysplasias/pathology , RNA Splicing , RNA, Small Nuclear/chemistry , RNA, Small Nuclear/genetics
4.
NAR Genom Bioinform ; 2(4): lqaa095, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33575639

ABSTRACT

Influenza A viruses (IAVs) use diverse mechanisms to interfere with cellular gene expression. Although many RNA-seq studies have documented IAV-induced changes in host mRNA abundance, few were designed to allow an accurate quantification of changes in host mRNA splicing. Here, we show that IAV infection of human lung cells induces widespread alterations of cellular splicing, with an overall increase in exon inclusion and decrease in intron retention. Over half of the mRNAs that show differential splicing undergo no significant changes in abundance or in their 3' end termination site, suggesting that IAVs can specifically manipulate cellular splicing. Among a randomly selected subset of 21 IAV-sensitive alternative splicing events, most are specific to IAV infection as they are not observed upon infection with VSV, induction of interferon expression or induction of an osmotic stress. Finally, the analysis of splicing changes in RED-depleted cells reveals a limited but significant overlap with the splicing changes in IAV-infected cells. This observation suggests that hijacking of RED by IAVs to promote splicing of the abundant viral NS1 mRNAs could partially divert RED from its target mRNAs. All our RNA-seq datasets and analyses are made accessible for browsing through a user-friendly Shiny interface (http://virhostnet.prabi.fr:3838/shinyapps/flu-splicing or https://github.com/cbenoitp/flu-splicing).

5.
Sci Rep ; 9(1): 14908, 2019 10 17.
Article in English | MEDLINE | ID: mdl-31624302

ABSTRACT

Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T's. This bias is marked for runs of at least 15 T's, but is already detectable for runs of at least 9 T's and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Nanopore Sequencing/methods , RNA-Seq/methods , Sequence Analysis, DNA/methods , Transcriptome/genetics , Animals , Brain , DNA, Complementary/genetics , DNA, Complementary/isolation & purification , Datasets as Topic , Gene Library , High-Throughput Nucleotide Sequencing/instrumentation , Liver , Mice , Nanopore Sequencing/instrumentation , RNA/genetics , RNA/isolation & purification , RNA-Seq/instrumentation , Sequence Analysis, DNA/instrumentation
6.
RNA ; 25(9): 1130-1149, 2019 09.
Article in English | MEDLINE | ID: mdl-31175170

ABSTRACT

Minor intron splicing plays a central role in human embryonic development and survival. Indeed, biallelic mutations in RNU4ATAC, transcribed into the minor spliceosomal U4atac snRNA, are responsible for three rare autosomal recessive multimalformation disorders named Taybi-Linder (TALS/MOPD1), Roifman (RFMN), and Lowry-Wood (LWS) syndromes, which associate numerous overlapping signs of varying severity. Although RNA-seq experiments have been conducted on a few RFMN patient cells, none have been performed in TALS, and more generally no in-depth transcriptomic analysis of the ∼700 human genes containing a minor (U12-type) intron had been published as yet. We thus sequenced RNA from cells derived from five skin, three amniotic fluid, and one blood biosamples obtained from seven unrelated TALS cases and from age- and sex-matched controls. This allowed us to describe for the first time the mRNA expression and splicing profile of genes containing U12-type introns, in the context of a functional minor spliceosome. Concerning RNU4ATAC-mutated patients, we show that as expected, they display distinct U12-type intron splicing profiles compared to controls, but that rather unexpectedly mRNA expression levels are mostly unchanged. Furthermore, although U12-type intron missplicing concerns most of the expressed U12 genes, the level of U12-type intron retention is surprisingly low in fibroblasts and amniocytes, and much more pronounced in blood cells. Interestingly, we found several occurrences of introns that can be spliced using either U2, U12, or a combination of both types of splice site consensus sequences, with a shift towards splicing using preferentially U2 sites in TALS patients' cells compared to controls.


Subject(s)
Dwarfism/genetics , Fetal Growth Retardation/genetics , Microcephaly/genetics , Osteochondrodysplasias/genetics , RNA Splicing/genetics , Transcriptome/genetics , Adult , Aged , Base Sequence/genetics , Child, Preschool , Consensus Sequence/genetics , Female , Gene Expression Profiling/methods , Humans , Infant , Introns/genetics , Male , Middle Aged , RNA/genetics , RNA, Messenger/genetics , RNA, Small Nuclear/genetics , Spliceosomes/genetics , Young Adult
7.
Trends Microbiol ; 27(3): 268-281, 2019 03.
Article in English | MEDLINE | ID: mdl-30577974

ABSTRACT

Alteration of host cell splicing is a common feature of many viral infections which is underappreciated because of the complexity and technical difficulty of studying alternative splicing (AS) regulation. Recent advances in RNA sequencing technologies revealed that up to several hundreds of host genes can show altered mRNA splicing upon viral infection. The observed changes in AS events can be either a direct consequence of viral manipulation of the host splicing machinery or result indirectly from the virus-induced innate immune response or cellular damage. Analysis at a higher resolution with single-cell RNAseq, and at a higher scale with the integration of multiple omics data sets in a systems biology perspective, will be needed to further comprehend this complex facet of virus-host interactions.


Subject(s)
Alternative Splicing/genetics , Host Microbial Interactions/genetics , Immunity, Innate , Viruses/genetics , Host Microbial Interactions/immunology , Humans , Viruses/immunology , Viruses/pathogenicity
8.
PLoS Genet ; 14(11): e1007758, 2018 11.
Article in English | MEDLINE | ID: mdl-30419019

ABSTRACT

Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient-experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa-along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.


Subject(s)
Genome, Bacterial , Genome-Wide Association Study/methods , Computer Graphics , DNA, Bacterial/genetics , Databases, Genetic , Drug Resistance, Bacterial/genetics , Genetic Variation , Genome-Wide Association Study/statistics & numerical data , Interspersed Repetitive Sequences , Models, Genetic , Mycobacterium tuberculosis/drug effects , Mycobacterium tuberculosis/genetics , Phenotype , Pseudomonas aeruginosa/drug effects , Pseudomonas aeruginosa/genetics , Sequence Analysis, DNA , Software , Staphylococcus aureus/drug effects , Staphylococcus aureus/genetics
9.
Sci Rep ; 8(1): 4307, 2018 03 09.
Article in English | MEDLINE | ID: mdl-29523794

ABSTRACT

Genome-wide analyses estimate that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most methods start by mapping the reads to an annotated reference genome, but some start by a de novo assembly of the reads. In this paper, we present a systematic comparison of a mapping-first approach (FARLINE) and an assembly-first approach (KISSPLICE). We applied these methods to two independent RNAseq datasets and found that the predictions of the two pipelines overlapped (70% of exon skipping events were common), but with noticeable differences. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in recently duplicated genes. The mapping-first approach allowed to find more lowly expressed splicing variants, and splice variants overlapping repeats. This work demonstrates that annotating AS with a single approach leads to missing out a large number of candidates, many of which are differentially regulated across conditions and can be validated experimentally. We therefore advocate for the combined use of both mapping-first and assembly-first approaches for the annotation and differential analysis of AS from RNAseq datasets.


Subject(s)
Alternative Splicing , Sequence Analysis, RNA/methods , Software , Humans , RNA Splice Sites , Sequence Analysis, RNA/standards
10.
Algorithms Mol Biol ; 12: 2, 2017.
Article in English | MEDLINE | ID: mdl-28250805

ABSTRACT

BACKGROUND: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. RESULTS: The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.

11.
Sci Rep ; 7: 40618, 2017 01 16.
Article in English | MEDLINE | ID: mdl-28091568

ABSTRACT

Crosses between close species can lead to genomic disorders, often considered to be the cause of hybrid incompatibility, one of the initial steps in the speciation process. How these incompatibilities are established and what are their causes remain unclear. To understand the initiation of hybrid incompatibility, we performed reciprocal crosses between two species of Drosophila (D. mojavensis and D. arizonae) that diverged less than 1 Mya. We performed a genome-wide transcriptomic analysis on ovaries from parental lines and on hybrids from reciprocal crosses. Using an innovative procedure of co-assembling transcriptomes, we show that parental lines differ in the expression of their genes and transposable elements. Reciprocal hybrids presented specific gene categories and few transposable element families misexpressed relative to the parental lines. Because TEs are mainly silenced by piwi-interacting RNAs (piRNAs), we hypothesize that in hybrids the deregulation of specific TE families is due to the absence of such small RNAs. Small RNA sequencing confirmed our hypothesis and we therefore propose that TEs can indeed be major players of genome differentiation and be implicated in the first steps of genomic incompatibilities through small RNA regulation.


Subject(s)
DNA Transposable Elements/genetics , Drosophila/genetics , Gene Expression Regulation , Hybridization, Genetic , Animals , Conserved Sequence/genetics , Female , Gene Ontology , Genes, Insect , Geography , Inheritance Patterns/genetics , Male , Mexico , RNA, Small Interfering/metabolism , Species Specificity , Transcriptome/genetics , United States
12.
Nucleic Acids Res ; 44(19): e148, 2016 Nov 02.
Article in English | MEDLINE | ID: mdl-27458203

ABSTRACT

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.


Subject(s)
Base Sequence , Genome , Polymorphism, Single Nucleotide , Sequence Analysis, RNA , Algorithms , Amino Acid Sequence , Animals , Computational Biology/methods , Genetic Markers , Genomics/methods , Genotype , Humans , Phenotype , Reproducibility of Results , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , Transcriptome
13.
Nat Commun ; 7: 11067, 2016 Apr 11.
Article in English | MEDLINE | ID: mdl-27063795

ABSTRACT

Myotonic dystrophy (DM) is caused by the expression of mutant RNAs containing expanded CUG repeats that sequester muscleblind-like (MBNL) proteins, leading to alternative splicing changes. Cardiac alterations, characterized by conduction delays and arrhythmia, are the second most common cause of death in DM. Using RNA sequencing, here we identify novel splicing alterations in DM heart samples, including a switch from adult exon 6B towards fetal exon 6A in the cardiac sodium channel, SCN5A. We find that MBNL1 regulates alternative splicing of SCN5A mRNA and that the splicing variant of SCN5A produced in DM presents a reduced excitability compared with the control adult isoform. Importantly, reproducing splicing alteration of Scn5a in mice is sufficient to promote heart arrhythmia and cardiac-conduction delay, two predominant features of myotonic dystrophy. In conclusion, misregulation of the alternative splicing of SCN5A may contribute to a subset of the cardiac dysfunctions observed in myotonic dystrophy.


Subject(s)
Alternative Splicing/genetics , Arrhythmias, Cardiac/complications , Arrhythmias, Cardiac/genetics , Heart Conduction System/physiopathology , Myotonic Dystrophy/complications , Myotonic Dystrophy/genetics , NAV1.5 Voltage-Gated Sodium Channel/genetics , Adult , Aged , Animals , Base Sequence , Binding Sites , Computer Simulation , Electrophysiological Phenomena , Exons/genetics , Female , HEK293 Cells , Heart Conduction System/pathology , Humans , Male , Middle Aged , Molecular Sequence Data , NAV1.5 Voltage-Gated Sodium Channel/metabolism , Nucleotide Motifs/genetics , RNA-Binding Proteins/metabolism , Sodium Channels/metabolism , Xenopus
14.
Gigascience ; 5: 9, 2016.
Article in English | MEDLINE | ID: mdl-26870323

ABSTRACT

BACKGROUND: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. FINDINGS: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. CONCLUSIONS: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Information Storage and Retrieval/methods , Software , Base Sequence , Cluster Analysis , Genome/genetics , Genomics/methods , Molecular Sequence Data , Reproducibility of Results
15.
Algorithms Mol Biol ; 10: 20, 2015.
Article in English | MEDLINE | ID: mdl-26120359

ABSTRACT

BACKGROUND: The problem of enumerating bubbles with length constraints in directed graphs arises in transcriptomics where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. RESULTS: We present a new algorithm for enumerating bubbles with length constraints in weighted directed graphs. This is the first polynomial delay algorithm for this problem and we show that in practice, it is faster than previous approaches. CONCLUSION: This settles one of the main open questions from Sacomoto et al. (BMC Bioinform 13:5, 2012). Moreover, the new algorithm allows us to deal with larger instances and possibly detect longer alternative splicing events.

16.
Nucleic Acids Res ; 43(2): e11, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25404127

ABSTRACT

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism.


Subject(s)
Genotyping Techniques/methods , Polymorphism, Single Nucleotide , Algorithms , Animals , Chromosomes, Human, Pair 1 , Escherichia coli/genetics , Genomics/methods , Humans , Ixodes/genetics , Mice , Mice, Inbred C57BL , Saccharomyces cerevisiae/genetics
17.
Bioinformatics ; 30(1): 61-70, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-24167155

ABSTRACT

MOTIVATION: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. RESULTS: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. AVAILABILITY AND IMPLEMENTATION: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.


Subject(s)
Cadmium/pharmacology , Metabolomics/methods , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae/metabolism , Enzyme Activation , Glutathione/biosynthesis
18.
BMC Genomics ; 14: 309, 2013 May 08.
Article in English | MEDLINE | ID: mdl-23651581

ABSTRACT

BACKGROUND: Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. RESULTS: We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. CONCLUSION: In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.


Subject(s)
Genome, Archaeal/genetics , Genome, Bacterial/genetics , Genomic Instability , Models, Genetic , Species Specificity
19.
Nucleic Acids Res ; 41(Database issue): D142-51, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23143107

ABSTRACT

Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, 'Junction Search' screens through the RNA-seq reads found at the chimeras' junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.


Subject(s)
Databases, Genetic , Mutant Chimeric Proteins/genetics , RNA/chemistry , Animals , Chromosome Breakpoints , Computer Graphics , Drosophila/genetics , Gene Fusion , Humans , Internet , Mice , Mutant Chimeric Proteins/metabolism , Neoplasms/genetics , RNA/metabolism , Sequence Analysis, RNA
20.
BMC Genomics ; 13: 438, 2012 Aug 31.
Article in English | MEDLINE | ID: mdl-22938206

ABSTRACT

BACKGROUND: A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. RESULTS: We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. CONCLUSION: Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.


Subject(s)
Bacteria/genetics , Bacteria/metabolism , Evolution, Molecular , Genome, Bacterial/genetics , Metabolic Networks and Pathways/genetics , Symbiosis/genetics , Models, Genetic , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...