Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
Methods Mol Biol ; 1097: 125-41, 2014.
Article in English | MEDLINE | ID: mdl-24639158

ABSTRACT

Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs.


Subject(s)
Computational Biology/methods , Consensus Sequence , Nucleic Acid Conformation , RNA/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Algorithms , Internet , RNA/genetics , RNA Folding , Software , Thermodynamics , Web Browser
2.
Genome Res ; 24(4): 616-28, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24429298

ABSTRACT

Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.


Subject(s)
Conserved Sequence/genetics , Evolution, Molecular , Promoter Regions, Genetic , RNA, Long Noncoding/genetics , Animals , Cattle , Exons , Humans , Mice , Organ Specificity , Rats
3.
Nature ; 505(7485): 701-5, 2014 Jan 30.
Article in English | MEDLINE | ID: mdl-24336214

ABSTRACT

RNA has a dual role as an informational molecule and a direct effector of biological tasks. The latter function is enabled by RNA's ability to adopt complex secondary and tertiary folds and thus has motivated extensive computational and experimental efforts for determining RNA structures. Existing approaches for evaluating RNA structure have been largely limited to in vitro systems, yet the thermodynamic forces which drive RNA folding in vitro may not be sufficient to predict stable RNA structures in vivo. Indeed, the presence of RNA-binding proteins and ATP-dependent helicases can influence which structures are present inside cells. Here we present an approach for globally monitoring RNA structure in native conditions in vivo with single-nucleotide precision. This method is based on in vivo modification with dimethyl sulphate (DMS), which reacts with unpaired adenine and cytosine residues, followed by deep sequencing to monitor modifications. Our data from yeast and mammalian cells are in excellent agreement with known messenger RNA structures and with the high-resolution crystal structure of the Saccharomyces cerevisiae ribosome. Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermostable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells. Our studies broadly enable the functional analysis of physiological RNA structures and reveal that, in contrast to the Anfinsen view of protein folding whereby the structure formed is the most thermodynamically favourable, thermodynamics have an incomplete role in determining mRNA structure in vivo.


Subject(s)
Genome, Fungal/genetics , Nucleic Acid Conformation , RNA Folding , RNA Stability , RNA, Messenger/chemistry , RNA, Messenger/genetics , Saccharomyces cerevisiae/genetics , Fibroblasts , High-Throughput Nucleotide Sequencing , Humans , K562 Cells , Nucleic Acid Denaturation , RNA Folding/genetics , RNA Stability/genetics , RNA, Fungal/chemistry , RNA, Fungal/genetics , RNA, Fungal/metabolism , RNA, Messenger/metabolism , Sulfuric Acid Esters/chemistry , Thermodynamics
4.
Wiley Interdiscip Rev RNA ; 3(6): 759-78, 2012.
Article in English | MEDLINE | ID: mdl-22991327

ABSTRACT

Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.


Subject(s)
Computational Biology/methods , RNA, Untranslated/chemistry , Animals , Data Mining , Databases, Nucleic Acid , Genomics/methods , Humans , MicroRNAs/chemistry , Molecular Sequence Annotation , Nucleic Acid Conformation , RNA Folding
5.
Nucleic Acids Res ; 40(10): 4261-72, 2012 May.
Article in English | MEDLINE | ID: mdl-22287623

ABSTRACT

Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.


Subject(s)
Algorithms , RNA Folding , Thermodynamics , Base Sequence , Nucleotides/chemistry , RNA/chemistry , RNA, Transfer/chemistry , RNA-Binding Proteins/chemistry
6.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
7.
Genome Res ; 21(11): 1929-43, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21994249

ABSTRACT

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN ß lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.


Subject(s)
Genome , Genomics , RNA, Untranslated/chemistry , Regulatory Sequences, Ribonucleic Acid , Vertebrates/genetics , 3' Untranslated Regions , Animals , Base Sequence , Conserved Sequence , Gene Expression Regulation , Humans , Immunity/genetics , Methionine Adenosyltransferase/genetics , Molecular Sequence Data , Nucleic Acid Conformation , Phylogeny , Protein Biosynthesis , RNA Editing , RNA Precursors/metabolism , RNA Processing, Post-Transcriptional , RNA Stability , RNA, Messenger/metabolism , RNA, Transfer/chemistry , RNA, Transfer/metabolism , RNA, Untranslated/genetics , Sequence Alignment
8.
Genome Res ; 21(11): 1916-28, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21994248

ABSTRACT

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.


Subject(s)
Genome , Mammals/genetics , Open Reading Frames/genetics , Selection, Genetic , Animals , Base Composition , Base Sequence , Codon , Codon, Initiator , Computational Biology , Conserved Sequence , Enhancer Elements, Genetic , Exons , Gene Order , Genes, BRCA1 , Homeodomain Proteins/genetics , Humans , MicroRNAs/metabolism , Molecular Sequence Data , Mutation Rate , Nucleic Acid Conformation , Nucleosomes/metabolism , Peptide Chain Initiation, Translational , RNA Splicing , Sequence Alignment , Transcription, Genetic
9.
RNA ; 17(4): 578-94, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21357752

ABSTRACT

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.


Subject(s)
Genetic Code , RNA, Messenger/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Algorithms , Animals , Base Pairing , Drosophila melanogaster/genetics , Escherichia coli/genetics , Mass Spectrometry , Molecular Sequence Annotation , Molecular Sequence Data , Open Reading Frames , Peptides/genetics , RNA, Untranslated/genetics
10.
Science ; 330(6012): 1787-97, 2010 Dec 24.
Article in English | MEDLINE | ID: mdl-21177974

ABSTRACT

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.


Subject(s)
Chromatin , Drosophila melanogaster/genetics , Gene Regulatory Networks , Genome, Insect , Molecular Sequence Annotation , Animals , Binding Sites , Chromatin/genetics , Chromatin/metabolism , Computational Biology/methods , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/growth & development , Drosophila melanogaster/metabolism , Epigenesis, Genetic , Gene Expression Regulation , Genes, Insect , Genomics/methods , Histones/metabolism , Nucleosomes/genetics , Nucleosomes/metabolism , Promoter Regions, Genetic , RNA, Small Untranslated/genetics , RNA, Small Untranslated/metabolism , Transcription Factors/metabolism , Transcription, Genetic
11.
Anal Bioanal Chem ; 398(7-8): 2867-81, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20803007

ABSTRACT

Proteins with molecular weights of <25 kDa are involved in major biological processes such as ribosome formation, stress adaption (e.g., temperature reduction) and cell cycle control. Despite their importance, the coverage of smaller proteins in standard proteome studies is rather sparse. Here we investigated biochemical and mass spectrometric parameters that influence coverage and validity of identification. The underrepresentation of low molecular weight (LMW) proteins may be attributed to the low numbers of proteolytic peptides formed by tryptic digestion as well as their tendency to be lost in protein separation and concentration/desalting procedures. In a systematic investigation of the LMW proteome of Escherichia coli, a total of 455 LMW proteins (27% of the 1672 listed in the SwissProt protein database) were identified, corresponding to a coverage of 62% of the known cytosolic LMW proteins. Of these proteins, 93 had not yet been functionally classified, and five had not previously been confirmed at the protein level. In this study, the influences of protein extraction (either urea or TFA), proteolytic digestion (solely, and the combined usage of trypsin and AspN as endoproteases) and protein separation (gel- or non-gel-based) were investigated. Compared to the standard procedure based solely on the use of urea lysis buffer, in-gel separation and tryptic digestion, the complementary use of TFA for extraction or endoprotease AspN for proteolysis permits the identification of an extra 72 (32%) and 51 proteins (23%), respectively. Regarding mass spectrometry analysis with an LTQ Orbitrap mass spectrometer, collision-induced fragmentation (CID and HCD) and electron transfer dissociation using the linear ion trap (IT) or the Orbitrap as the analyzer were compared. IT-CID was found to yield the best identification rate, whereas IT-ETD provided almost comparable results in terms of LMW proteome coverage. The high overlap between the proteins identified with IT-CID and IT-ETD allowed the validation of 75% of the identified proteins using this orthogonal fragmentation technique. Furthermore, a new approach to evaluating and improving the completeness of protein databases that utilizes the program RNAcode was introduced and examined.


Subject(s)
Chromatography, Liquid/methods , Escherichia coli K12/chemistry , Escherichia coli Proteins/isolation & purification , Spectrometry, Mass, Electrospray Ionization/methods , Tandem Mass Spectrometry/methods , Escherichia coli Proteins/analysis , Molecular Weight
12.
Methods Mol Biol ; 609: 3-15, 2010.
Article in English | MEDLINE | ID: mdl-20221910

ABSTRACT

This chapter gives an overview of the most commonly used biological databases of nucleic acid sequences and their structures. We cover general sequence databases, databases for specific DNA features, noncoding RNA sequences, and RNA secondary and tertiary structures.


Subject(s)
DNA/chemistry , Databases, Genetic , RNA/chemistry , Animals , Base Sequence , Humans , Internet , Molecular Sequence Data , Nucleic Acid Conformation , RNA, Untranslated/chemistry
13.
Methods Mol Biol ; 609: 285-306, 2010.
Article in English | MEDLINE | ID: mdl-20221926

ABSTRACT

Noncoding RNAs (ncRNAs) are increasingly recognized as important functional molecules in the cell. Here we give a short overview of fundamental computational techniques to analyze ncRNAs that can help us better understand their function. Topics covered include prediction of secondary structure from the primary sequence, prediction of consensus structures for homologous sequences, search for homologous sequences in databases using sequence and structure comparisons, annotation of tRNAs, rRNAs, snoRNAs, and microRNAs, de novo prediction of novel ncRNAs, and prediction of RNA/RNA interactions including miRNA target prediction.


Subject(s)
Computational Biology , Data Mining , Databases, Genetic , RNA, Untranslated/chemistry , Sequence Analysis, RNA , Algorithms , Animals , Base Sequence , Humans , Molecular Sequence Data , Nucleic Acid Conformation , Sequence Alignment , Sequence Homology, Nucleic Acid , Software
14.
Pac Symp Biocomput ; : 69-79, 2010.
Article in English | MEDLINE | ID: mdl-19908359

ABSTRACT

RNAz is a widely used software package for de novo detection of structured noncoding RNAs in comparative genomics data. Four years of experience have not only demonstrated the applicability of the approach, but also helped us to identify limitations of the current implementation. RNAz 2.0 provides significant improvements in two respects: (1) The accuracy is increased by the systematic use of dinucleotide models. (2) Technical limitations of the previous version, such as the inability to handle alignments with more than six sequences, are overcome by increased training data and the usage of an entropy measure to represent sequence similarities. RNAz 2.0 shows a significantly lower false discovery rate on a dinucleotide background model than the previous version. Separate models for structural alignments provide an additional way to increase the predictive power. RNAz is open source software and can be obtained free of charge at: http://www.tbi.univie.ac.at/~wash/RNAz/


Subject(s)
RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Software , Algorithms , Base Sequence , Computational Biology , Models, Genetic , Nucleic Acid Conformation , RNA Stability , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data , Sequence Analysis, RNA , Thermodynamics
15.
Trends Genet ; 24(12): 583-7, 2008 Dec.
Article in English | MEDLINE | ID: mdl-18951646

ABSTRACT

Using genome-wide maps of nucleosome positions in yeast, we have analyzed the influence of chromatin structure on the molecular evolution of genomic DNA. We have observed, on average, 10-15% lower substitution rates in linker regions than in nucleosomal DNA. This widespread local rate heterogeneity represents an evolutionary footprint of nucleosome positions and reveals that nucleosome organization is a genomic feature conserved over evolutionary timescales.


Subject(s)
Evolution, Molecular , Nucleosomes/genetics , Saccharomyces cerevisiae/genetics , Base Composition , Conserved Sequence , DNA, Intergenic/genetics , Mutation/genetics , Open Reading Frames/genetics
16.
BMC Bioinformatics ; 9: 248, 2008 May 27.
Article in English | MEDLINE | ID: mdl-18505553

ABSTRACT

BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. RESULTS: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. CONCLUSION: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. AVAILABILITY: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.


Subject(s)
Computational Biology/methods , RNA, Untranslated/chemistry , Sequence Analysis, RNA/methods , Algorithms , Animals , Base Composition , Humans , Markov Chains , Sequence Alignment , Sequence Homology, Nucleic Acid , Software
17.
BMC Bioinformatics ; 9: 122, 2008 Feb 26.
Article in English | MEDLINE | ID: mdl-18302738

ABSTRACT

BACKGROUND: Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential. RESULTS: We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons. CONCLUSION: Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.


Subject(s)
Algorithms , Conserved Sequence/genetics , Evolution, Molecular , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Pair Mismatch , Base Sequence , Molecular Sequence Data , Sequence Homology, Nucleic Acid
18.
Methods Mol Biol ; 395: 503-26, 2007.
Article in English | MEDLINE | ID: mdl-17993695

ABSTRACT

The function of many noncoding RNAs (ncRNAs) depend on a defined secondary structure. RNAz detects evolutionarily conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments and, thus, efficiently filters for candidate ncRNAs. In this chapter, we provide a step-by-step guide on how to use RNAz. Starting with basic concepts, we also cover advanced analysis techniques and, as an example for a large scale application, demonstrate a complete screen of the Saccharomyces cerevisiae genome.


Subject(s)
Nucleic Acid Conformation , RNA, Untranslated/chemistry , Base Sequence , Sequence Homology, Nucleic Acid , Thermodynamics
19.
BMC Genomics ; 8: 406, 2007 Nov 08.
Article in English | MEDLINE | ID: mdl-17996037

ABSTRACT

BACKGROUND: Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. RESULTS: We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79-89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. CONCLUSION: The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383-1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.


Subject(s)
Drosophila melanogaster/genetics , RNA/genetics , Animals , Humans , Nucleic Acid Conformation , Phylogeny , RNA/chemistry , Sensitivity and Specificity
20.
Genome Res ; 17(6): 852-64, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17568003

ABSTRACT

Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).


Subject(s)
3' Untranslated Regions/genetics , GC Rich Sequence , Genome, Human , Quantitative Trait Loci , RNA, Untranslated/genetics , Transcription, Genetic , Base Sequence , Humans , Molecular Sequence Data , RNA, Messenger/genetics , Reverse Transcriptase Polymerase Chain Reaction
SELECTION OF CITATIONS
SEARCH DETAIL
...