Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Nature ; 453(7196): 793-7, 2008 Jun 05.
Article in English | MEDLINE | ID: mdl-18463636

ABSTRACT

RNA silencing is a conserved mechanism in which small RNAs trigger various forms of sequence-specific gene silencing by guiding Argonaute complexes to target RNAs by means of base pairing. RNA silencing is thought to have evolved as a form of nucleic-acid-based immunity to inactivate viruses and transposable elements. Although the activity of transposable elements in animals has been thought largely to be restricted to the germ line, recent studies have shown that they may also actively transpose in somatic cells, creating somatic mosaicism in animals. In the Drosophila germ line, Piwi-interacting RNAs arise from repetitive intergenic elements including retrotransposons by a Dicer-independent pathway and function through the Piwi subfamily of Argonautes to ensure silencing of retrotransposons. Here we show that, in cultured Drosophila S2 cells, Argonaute 2 (AGO2), an AGO subfamily member of Argonautes, associates with endogenous small RNAs of 20-22 nucleotides in length, which we have collectively named endogenous short interfering RNAs (esiRNAs). esiRNAs can be divided into two groups: one that mainly corresponds to a subset of retrotransposons, and the other that arises from stem-loop structures. esiRNAs are produced in a Dicer-2-dependent manner from distinctive genomic loci, are modified at their 3' ends and can direct AGO2 to cleave target RNAs. Mutations in Dicer-2 caused an increase in retrotransposon transcripts. Together, our findings indicate that different types of small RNAs and Argonautes are used to repress retrotransposons in germline and somatic cells in Drosophila.


Subject(s)
Drosophila Proteins/metabolism , Drosophila melanogaster/cytology , Drosophila melanogaster/metabolism , RNA, Small Interfering/metabolism , RNA-Induced Silencing Complex/metabolism , Animals , Argonaute Proteins , Cell Line , Drosophila Proteins/genetics , Drosophila melanogaster/enzymology , Drosophila melanogaster/genetics , Eukaryotic Initiation Factors , Germ Cells/metabolism , Mosaicism , Polymerase Chain Reaction , Protein Binding , RNA Helicases/genetics , RNA Helicases/metabolism , RNA Interference , RNA, Small Interfering/genetics , Retroelements/genetics , Ribonuclease III
2.
BMC Bioinformatics ; 9: 33, 2008 Jan 23.
Article in English | MEDLINE | ID: mdl-18215258

ABSTRACT

BACKGROUND: Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses. RESULTS: We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory. CONCLUSION: The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.


Subject(s)
Algorithms , Chromosome Mapping/methods , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Sequence , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity
3.
Bioinformatics ; 24(3): 367-73, 2008 Feb 01.
Article in English | MEDLINE | ID: mdl-18056736

ABSTRACT

MOTIVATION: Base pairing probability matrices have been frequently used for the analyses of structural RNA sequences. Recently, there has been a growing need for computing these probabilities for long DNA sequences by constraining the maximal span of base pairs to a limited value. However, none of the existing programs can exactly compute the base pairing probabilities associated with the energy model of secondary structures under such a constraint. RESULTS: We present an algorithm that exactly computes the base pairing probabilities associated with the energy model under the constraint on the maximal span W of base pairs. The complexity of our algorithm is given by O(NW2) in time and O(N+W2) in memory, where N is the sequence length. We show that our algorithm has a higher sensitivity to the true base pairs as compared to that of RNAplfold. We also present an algorithm that predicts a mutually consistent set of local secondary structures by maximizing the expected accuracy function. The comparison of the local secondary structure predictions with those of RNALfold indicates that our algorithm is more accurate. Our algorithms are implemented in the software named 'Rfold.' AVAILABILITY: The C++ source code of the Rfold software and the test dataset used in this study are available at http://www.ncrna.org/software/Rfold/.


Subject(s)
Algorithms , Base Pairing/genetics , Models, Genetic , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Sequence Analysis, RNA/methods , Software , Base Sequence , Computer Simulation , Models, Chemical , Models, Statistical , Molecular Sequence Data
4.
RNA ; 13(12): 2081-90, 2007 Dec.
Article in English | MEDLINE | ID: mdl-17959929

ABSTRACT

The identification of novel miRNAs has significant biological and clinical importance. However, none of the known miRNA features alone is sufficient for accurately detecting novel miRNAs. The aim of this paper is to integrate these features in a straightforward manner for detecting miRNAs with better accuracy. Since most miRNA regions are highly conserved among vertebrates for the ability to form stable hairpin structures, we implemented a hidden Markov model that outputs multidimensional feature vectors composed of both evolutionary features and secondary structural ones. The proposed method, called miRRim, outperformed existing ones in terms of detection/prediction performance: The total number of predictions was smaller than with existing methods when the number of miRNAs detected was adjusted to be the same. Moreover, there were several candidates predicted only by our method that are clustered with the known miRNAs, suggesting that our method is able to detect novel miRNAs. Genomic coordinates of predicted miRNA can be obtained from http://mirrim.ncrna.org/.


Subject(s)
MicroRNAs/analysis , MicroRNAs/genetics , Conserved Sequence , Genome, Human , Humans , Markov Chains , MicroRNAs/chemistry , Nucleic Acid Conformation , RNA Interference , Reproducibility of Results , Sensitivity and Specificity , Transcription, Genetic
5.
Bioinformatics ; 23(21): 2945-6, 2007 Nov 01.
Article in English | MEDLINE | ID: mdl-17893084

ABSTRACT

SUMMARY: We have launched a web server, which serves as a general-purpose idiogram rendering service, and allows users to generate high-quality idiograms with custom annotation according to their own genome-wide mapping/annotation data through an easy-to-use interface. The generated idiograms are suitable not only for visualizing summaries of genome-wide analysis but also for many types of presentation material including web pages, conference posters, oral presentations, etc. AVAILABILITY: Idiographica is freely available at http://www.ncrna.org/idiographica/


Subject(s)
Algorithms , Chromosome Banding/methods , Chromosome Mapping/methods , Computer Graphics , Internet , Software , User-Computer Interface , Animals , Humans , Mice , Rats
6.
Biochem Biophys Res Commun ; 357(3): 724-30, 2007 Jun 08.
Article in English | MEDLINE | ID: mdl-17445766

ABSTRACT

In the human HOXA locus a number of ncRNAs are transcribed from the intergenic regions in the opposite direction to HOXA mRNAs. We observed that the genomic organization of genes for the ncRNAs and HOXA proteins is highly conserved between human and mouse. We examined the expression profiles of these ncRNAs and HOXA mRNAs in various human tissues. The expression patterns of ncRNAs in human tissues coincide with those of the adjacent HOXA mRNAs that are collinearly expressed along the anteroposterior axis. This coordinated expression was observed even in transformed tumors and cancer cell lines, suggesting that the expression of ncRNAs is prerequisite for the regulated expression of HOXA genes. HIT18844 ncRNA transcribed from the most upstream position of the HOXA cluster possesses an ultra-conserved short stretch which potentially forms an evolutionarily conserved secondary structure. Our data suggest a critical role for ncRNAs in the regulation of HOXA gene expression.


Subject(s)
Gene Expression Profiling , Homeodomain Proteins/genetics , RNA, Messenger/genetics , RNA, Untranslated/genetics , Animals , Base Sequence , Cell Line , Cell Line, Tumor , Female , HL-60 Cells , HeLa Cells , Humans , Jurkat Cells , K562 Cells , Mice , Molecular Sequence Data , Nucleic Acid Conformation , RNA, Messenger/chemistry , RNA, Untranslated/chemistry , Reverse Transcriptase Polymerase Chain Reaction , Sequence Homology, Nucleic Acid
7.
Bioinformatics ; 23(13): 1588-98, 2007 Jul 01.
Article in English | MEDLINE | ID: mdl-17459961

ABSTRACT

MOTIVATION: Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences. RESULTS: We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is approximately 300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named 'Murlet'. AVAILABILITY: The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Base Sequence , Molecular Sequence Data
8.
Biochem Biophys Res Commun ; 357(4): 991-6, 2007 Jun 15.
Article in English | MEDLINE | ID: mdl-17451645

ABSTRACT

We have examined the expression profile of selected non-coding RNAs (ncRNAs) in 11 human tissues. Among 5489 full-length cDNA clones annotated as non-protein-coding transcripts in the H-Invitational database, we chose 150 clones for further analysis based on their gene structure and EST information. Expression profiling using quantitative RT-PCR and Northern blot hybridization revealed that the majority of the selected ncRNAs exhibited tissue specificity: 67% are predominantly expressed in a restricted subset of tissues. The absolute quantification of representative ncRNAs revealed that the majority of ncRNAs are expressed as low abundance transcripts. A comparative genomic analysis revealed that only 27% of the selected ncRNAs have mouse counterparts. Since the expression patterns of the human ncRNAs having no mouse counterparts remain to be similar to those of the mouse ncRNAs, the expression patterns of the selected ncRNAs may be conserved between human and mouse.


Subject(s)
RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Sequence Analysis, RNA , Transcription Factors/genetics , Transcription Factors/metabolism , Animals , Base Sequence , Humans , Mice , Molecular Sequence Data , Organ Specificity , Sequence Homology, Nucleic Acid , Tissue Distribution
9.
Nucleic Acids Res ; 35(Database issue): D145-8, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17099231

ABSTRACT

There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/


Subject(s)
Databases, Nucleic Acid , RNA, Untranslated/chemistry , Base Sequence , Genomics , Internet , MicroRNAs/physiology , RNA, Messenger/chemistry , RNA, Untranslated/physiology , User-Computer Interface
10.
Bioinformatics ; 23(4): 434-41, 2007 Feb 15.
Article in English | MEDLINE | ID: mdl-17182698

ABSTRACT

MOTIVATION: Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. RESULTS: We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. AVAILABILITY: The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Base Pair Mismatch/genetics , Base Pairing/genetics , Consensus Sequence/genetics , Models, Genetic , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Algorithms , Computer Simulation , Models, Statistical
11.
Bioinformatics ; 22(20): 2480-7, 2006 Oct 15.
Article in English | MEDLINE | ID: mdl-16908501

ABSTRACT

MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request.


Subject(s)
Algorithms , Information Storage and Retrieval/methods , RNA/chemistry , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Artificial Intelligence , Base Sequence , Databases, Genetic , Molecular Sequence Data , Pattern Recognition, Automated
12.
Bioinformatics ; 22(14): 1723-9, 2006 Jul 15.
Article in English | MEDLINE | ID: mdl-16690634

ABSTRACT

MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.


Subject(s)
Algorithms , Pattern Recognition, Automated/methods , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Artificial Intelligence , Base Sequence , Molecular Sequence Data , Sequence Homology, Nucleic Acid
13.
Nature ; 438(7071): 1157-61, 2005 Dec 22.
Article in English | MEDLINE | ID: mdl-16372010

ABSTRACT

The genome of Aspergillus oryzae, a fungus important for the production of traditional fermented foods and beverages in Japan, has been sequenced. The ability to secrete large amounts of proteins and the development of a transformation system have facilitated the use of A. oryzae in modern biotechnology. Although both A. oryzae and Aspergillus flavus belong to the section Flavi of the subgenus Circumdati of Aspergillus, A. oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its safety. Here we show that the 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three aspergilli species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.


Subject(s)
Aspergillus oryzae/genetics , Genome, Fungal , Genomics , Aspartic Acid Endopeptidases/genetics , Aspergillus oryzae/enzymology , Aspergillus oryzae/metabolism , Chromosomes, Fungal/genetics , Cytochrome P-450 Enzyme System/genetics , Genes, Fungal/genetics , Molecular Sequence Data , Phylogeny , Synteny
14.
Bioinformatics ; 18 Suppl 1: S268-75, 2002.
Article in English | MEDLINE | ID: mdl-12169556

ABSTRACT

MOTIVATION: Kernel methods such as support vector machines require a kernel function between objects to be defined a priori. Several works have been done to derive kernels from probability distributions, e.g., the Fisher kernel. However, a general methodology to design a kernel is not fully developed. RESULTS: We propose a reasonable way of designing a kernel when objects are generated from latent variable models (e.g., HMM). First of all, a joint kernel is designed for complete data which include both visible and hidden variables. Then a marginalized kernel for visible data is obtained by taking the expectation with respect to hidden variables. We will show that the Fisher kernel is a special case of marginalized kernels, which gives another viewpoint to the Fisher kernel theory. Although our approach can be applied to any object, we particularly derive several marginalized kernels useful for biological sequences (e.g., DNA and proteins). The effectiveness of marginalized kernels is illustrated in the task of classifying bacterial gyrase subunit B (gyrB) amino acid sequences.


Subject(s)
Algorithms , DNA Gyrase/chemistry , Models, Chemical , Models, Statistical , Sequence Analysis/methods , Amino Acid Sequence , Markov Chains , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity , Sequence Alignment/methods , Sequence Homology
15.
Genome Inform ; 13: 112-22, 2002.
Article in English | MEDLINE | ID: mdl-14571380

ABSTRACT

We present novel kernels that measure similarity of two RNA sequences, taking account of their secondary structures. Two types of kernels are presented. One is for RNA sequences with known secondary structures, the other for those without known secondary structures. The latter employs stochastic context-free grammar (SCFG) for estimating the secondary structure. We call the latter the marginalized count kernel (MCK). We show computational experiments for MCK using 74 sets of human tRNA sequence data: (i) kernel principal component analysis (PCA) for visualizing tRNA similarities, (ii) supervised classification with support vector machines (SVMs). Both types of experiment show promising results for MCKs.


Subject(s)
Computational Biology/methods , Nucleic Acid Conformation , RNA, Transfer/genetics , Sequence Analysis, RNA/methods , Data Interpretation, Statistical , Humans , ROC Curve
SELECTION OF CITATIONS
SEARCH DETAIL
...