Pesquisa | Portal Regional da BVS

RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

Horesh, Yair; Wexler, Ydo; Lebenthal, Ilana; Ziv-Ukelson, Michal; Unger, Ron.

BMC Bioinformatics ; 10: 76, 2009 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-19257906

RESUMO

BACKGROUND: Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described. RESULTS: Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome. CONCLUSION: The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

Assuntos

Algoritmos , Genoma , RNA/química , Software , Conformação de Ácido Nucleico , RNA não Traduzido/química

Psiscan: a computational approach to identify H/ACA-like and AGA-like non-coding RNA in trypanosomatid genomes.

Myslyuk, Inna; Doniger, Tirza; Horesh, Yair; Hury, Avraham; Hoffer, Ran; Ziporen, Yaara; Michaeli, Shulamit; Unger, Ron.

BMC Bioinformatics ; 9: 471, 2008 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-18986541

RESUMO

BACKGROUND: Detection of non coding RNA (ncRNA) molecules is a major bioinformatics challenge. This challenge is particularly difficult when attempting to detect H/ACA molecules which are involved in converting uridine to pseudouridine on rRNA in trypanosomes, because these organisms have unique H/ACA molecules (termed H/ACA-like) that lack several of the features that characterize H/ACA molecules in most other organisms. RESULTS: We present here a computational tool called Psiscan, which was designed to detect H/ACA-like molecules in trypanosomes. We started by analyzing known H/ACA-like molecules and characterized their crucial elements both computationally and experimentally. Next, we set up constraints based on this analysis and additional phylogenic and functional data to rapidly scan three trypanosome genomes (T. brucei, T. cruzi and L. major) for sequences that observe these constraints and are conserved among the species. In the next step, we used minimal energy calculation to select the molecules that are predicted to fold into a lowest energy structure that is consistent with the constraints. In the final computational step, we used a Support Vector Machine that was trained on known H/ACA-like molecules as positive examples and on negative examples of molecules that were identified by the computational analyses but were shown experimentally not to be H/ACA-like molecules. The leading candidate molecules predicted by the SVM model were then subjected to experimental validation. CONCLUSION: The experimental validation showed 11 molecules to be expressed (4 out of 25 in the intermediate stage and 7 out of 19 in the final validation after the machine learning stage). Five of these 11 molecules were further shown to be bona fide H/ACA-like molecules. As snoRNA in trypanosomes are organized in clusters, the new H/ACA-like molecules could be used as starting points to manually search for additional molecules in their neighbourhood. All together this study increased our repertoire by fourteen H/ACA-like and six C/D snoRNAs molecules from T. brucei and L. Major. In addition the experimental analysis revealed that six ncRNA molecules that are expressed are not downregulated in CBF5 silenced cells, suggesting that they have structural features of H/ACA-like molecules but do not have their standard function. We termed this novel class of molecules AGA-like, and we are exploring their function. This study demonstrates the power of tight collaboration between computational and experimental approaches in a combined effort to reveal the repertoire of ncRNA molecles.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , RNA Nuclear Pequeno/genética , Software , Trypanosoma/genética , Animais , Inteligência Artificial , Modelos Genéticos , Mutagênese , Dobramento de Proteína , Estrutura Secundária de Proteína

RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules.

Horesh, Yair; Doniger, Tirza; Michaeli, Shulamit; Unger, Ron.

BMC Bioinformatics ; 8: 366, 2007 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-17908318

RESUMO

BACKGROUND: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure. RESULTS: We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space. CONCLUSION: The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.

Assuntos

Algoritmos , Modelos Químicos , RNA não Traduzido/química , RNA não Traduzido/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Simulação por Computador , Modelos Moleculares , Dados de Sequência Molecular , Conformação de Ácido Nucleico

Designing an A* algorithm for calculating edit distance between rooted-unordered trees.

Horesh, Yair; Mehr, Ramit; Unger, Ron.

J Comput Biol ; 13(6): 1165-76, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16901235

RESUMO

Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A* is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A* can reduce search time dramatically. We have designed and implemented a variant of the A* search algorithm suitable for calculating tree edit distance. We show here that A* is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.

Assuntos

Algoritmos , Biologia Computacional/métodos , Doenças Autoimunes/genética , Humanos , Imunoglobulinas/genética , Modelos Teóricos , Mutação

A bottom-up clustering algorithm to detect ncRNA molecules with a common secondary structure.

Horesh, Yair; Unger, Ron.

Int J Bioinform Res Appl ; 1(3): 292-304, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-18048137

RESUMO

Recently, there has been much interest in exploring the universe of non-protein coding RNA molecules that operate in the cell. We suggested an approach using a simple two-dimensional representation of RNA molecules that can identify common structural features of RNA molecules. Here, we address a common situation in which there is a large and diverse population of candidate molecules, and the task is to identify a small subset (or subsets) of RNA molecules that share a common structure. With certain constraints, our algorithm enumerates all possible sets of RNA molecules that have a common structure by first grouping together all molecules that have a single common structural feature and, using an iterative approach, search for subsets that share additional structural motifs. In a computational experiment, we were able to detect members of three small classes of RNA molecules, each containing several dozen members that were mixed in a population of 2778 non-coding sequences common to two trypanosome species.

Assuntos

Análise por Conglomerados , Estrutura Secundária de Proteína , Algoritmos , Sequência de Bases , RNA não Traduzido/química

RNAMAT: an efficient method to detect classes of RNA molecules and their structural features.

Horesh, Yair; Amir, Amihood; Michaeli, Shulamit; Unger, Ron.

Conf Proc IEEE Eng Med Biol Soc ; 2004: 2869-72, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-17270876

RESUMO

There is a growing appreciation for the diverse and important roles RNA molecules play in cellular function. RNAMAT is an approach based on matrix representation of all potential base-pairing of a set of sequences to reveal common secondary-structure features. When the RNA sequences come from one class, proper summation of these matrices exposes common structural features as demonstrated for tRNA and HACA-RNA. For C/D-RNA, a novel structural motif is suggested. Furthermore, it is demonstrated, in the case of tmRNA that the method can detect pseudo-knots which are structural motifs that are difficult to detect in other methods. When the sequences come from diverse sources, a specific clustering algorithm is suggested that is capable of detecting the common motifs. The algorithm is demonstrated in a case of a simulated example and in a real case derived from trypanosomes comparative RNomics study.

A rapid method for detection of putative RNAi target genes in genomic data.

Horesh, Yair; Amir, Amihood; Michaeli, Shulamit; Unger, Ron.

Bioinformatics ; 19 Suppl 2: ii73-80, 2003 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-14534175

RESUMO

RNAi, inhibition of gene expression by double stranded RNA molecules, has rapidly become a powerful laboratory technique to study gene function. The effectiveness of the procedure raised the question of whether this laboratory technique may actually mimic a natural cellular control mechanism that works on similar principles. Indeed recent evidence is accumulating to suggest that RNAi is a natural control mechanism that might even serve as a primitive immune response against RNA viruses and retroposons. Three different interference scenarios seem to be utilized by various RNAi mechanisms. One of the mechanisms involves degradation of mRNA molecules. Here we suggest a method to systematically scan entire genomes simultaneously for RNAi elements and the presence of cellular genes that are degraded by these RNAi elements via exact short base-pair matching. The method is based on scanning the genomes using a suffix tree data structure that was specifically modified to identify sets of combinations of repeated and inverted repeated sequences of 20 bp or more. Initial scan suggest that a large number, about 7% of C.elegans and 3% of C.briggsae genes, have the potential to be subject to natural RNAi control. Two methods are proposed to further analyze these genes to select the cases that are more likely to be actual cases of RNAi control. One method involves looking for ESTs that can provide direct evidence that RNAi control element are indeed expressed. The other method looks for synteny between C.elegans and C.briggsae assuming that genes that might be under RNAi control in both organisms are more likely to be biological significant. Taken together, supportive evidence was found for about 70 genes to be under RNAi control. Among these genes are: transposase, hormone receptors, homeobox proteins, defensin, actins, and several types of collagens. While our method is not capable of detecting all cases of natural RNAi control, it points to a large number of potential cases that can be further verified by experimental work.

Assuntos

Caenorhabditis/genética , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Marcação de Genes/métodos , Interferência de RNA , RNA Interferente Pequeno/genética , Análise de Sequência de RNA/métodos , Animais , Sequência de Bases , Dados de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA