Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
PLoS Comput Biol ; 20(5): e1011787, 2024 May.
Article in English | MEDLINE | ID: mdl-38713726

ABSTRACT

Understanding and targeting functional RNA structures towards treatment of coronavirus infection can help us to prepare for novel variants of SARS-CoV-2 (the virus causing COVID-19), and any other coronaviruses that could emerge via human-to-human transmission or potential zoonotic (inter-species) events. Leveraging the fact that all coronaviruses use a mechanism known as -1 programmed ribosomal frameshifting (-1 PRF) to replicate, we apply algorithms to predict the most energetically favourable secondary structures (each nucleotide involved in at most one pairing) that may be involved in regulating the -1 PRF event in coronaviruses, especially SARS-CoV-2. We compute previously unknown most stable structure predictions for the frameshift site of coronaviruses via hierarchical folding, a biologically motivated framework where initial non-crossing structure folds first, followed by subsequent, possibly crossing (pseudoknotted), structures. Using mutual information from 181 coronavirus sequences, in conjunction with the algorithm KnotAli, we compute secondary structure predictions for the frameshift site of different coronaviruses. We then utilize the Shapify algorithm to obtain most stable SARS-CoV-2 secondary structure predictions guided by frameshift sequence-specific and genome-wide experimental data. We build on our previous secondary structure investigation of the singular SARS-CoV-2 68 nt frameshift element sequence, by using Shapify to obtain predictions for 132 extended sequences and including covariation information. Previous investigations have not applied hierarchical folding to extended length SARS-CoV-2 frameshift sequences. By doing so, we simulate the effects of ribosome interaction with the frameshift site, providing insight to biological function. We contribute in-depth discussion to contextualize secondary structure dual-graph motifs for SARS-CoV-2, highlighting the energetic stability of the previously identified 3_8 motif alongside the known dominant 3_3 and 3_6 (native-type) -1 PRF structures. Using a combination of thermodynamic methods and sequence covariation, our novel predictions suggest function of the attenuator hairpin via previously unknown pseudoknotted base pairing. While certain initial RNA folding is consistent, other pseudoknotted base pairs form which indicate potential conformational switching between the two structures.


Subject(s)
Algorithms , COVID-19 , Computational Biology , Frameshifting, Ribosomal , Nucleic Acid Conformation , RNA, Viral , SARS-CoV-2 , Frameshifting, Ribosomal/genetics , SARS-CoV-2/genetics , RNA, Viral/genetics , RNA, Viral/chemistry , Humans , COVID-19/virology , Computational Biology/methods , Coronavirus/genetics
2.
PLoS One ; 19(4): e0298164, 2024.
Article in English | MEDLINE | ID: mdl-38574063

ABSTRACT

SARS-CoV-2, the causative agent of COVID-19, is known to exhibit secondary structures in its 5' and 3' untranslated regions, along with the frameshifting stimulatory element situated between ORF1a and 1b. To identify additional regions containing conserved structures, we utilized a multiple sequence alignment with related coronaviruses as a starting point. We applied a computational pipeline developed for identifying non-coding RNA elements. Our pipeline employed three different RNA structural prediction approaches. We identified forty genomic regions likely to harbor structures, with ten of them showing three-way consensus substructure predictions among our predictive utilities. We conducted intracomparisons of the predictive utilities within the pipeline and intercomparisons with four previously published SARS-CoV-2 structural datasets. While there was limited agreement on the precise structure, different approaches seemed to converge on regions likely to contain structures in the viral genome. By comparing and combining various computational approaches, we can predict regions most likely to form structures, as well as a probable structure or ensemble of structures. These predictions can be used to guide surveillance, prophylactic measures, or therapeutic efforts. Data and scripts employed in this study may be found at https://doi.org/10.5281/zenodo.8298680.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Sequence Alignment , Genome, Viral/genetics , RNA, Viral/genetics , RNA, Viral/chemistry
3.
Algorithms Mol Biol ; 19(1): 9, 2024 Mar 03.
Article in English | MEDLINE | ID: mdl-38433200

ABSTRACT

MOTIVATION: Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( O ( n 3 ) time and O ( n 2 ) space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements. RESULTS: Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases. CONCLUSION: Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.

4.
Article in English | MEDLINE | ID: mdl-38345958

ABSTRACT

Interaction of nucleic acid molecules is essential for their functional roles in the cell and their applications in biotechnology. While simple duplex interactions have been studied before, the problem of efficiently predicting the minimum free energy structure of more complex interactions with possibly pseudoknotted structures remains a challenge. In this work, we introduce a novel and efficient algorithm for prediction of Duplex Interaction of Nucleic acids with pseudoKnots, DinoKnot follows the hierarchical folding hypothesis to predict the secondary structure of two interacting nucleic acid strands (both homo- and hetero-dimers). DinoKnot utilizes the structure of molecules before interaction as a guide to find their duplex structure allowing for possible base pair competitions. To showcase DinoKnots's capabilities we evaluated its predicted structures against (1) experimental results for SARS-CoV-2 genome and nine primer-probe sets, (2) a clinically verified example of a mutation affecting detection, and (3) a known nucleic acid interaction involving a pseudoknot. In addition, we compared our results against our closest competition, RNAcofold, further highlighting DinoKnot's strengths. We believe DinoKnot can be utilized for various applications including screening new variants for potential detection issues and supporting existing applications involving DNA/RNA interactions, adding structural considerations to the interaction to elicit functional information.


Subject(s)
Algorithms , Computational Biology , Nucleic Acid Conformation , SARS-CoV-2 , SARS-CoV-2/genetics , SARS-CoV-2/chemistry , Computational Biology/methods , COVID-19/virology , RNA, Viral/genetics , RNA, Viral/chemistry , RNA, Viral/metabolism , Genome, Viral/genetics , Betacoronavirus/genetics , Betacoronavirus/chemistry
5.
PLoS Comput Biol ; 19(2): e1010922, 2023 02.
Article in English | MEDLINE | ID: mdl-36854032

ABSTRACT

Multiple coronaviruses including MERS-CoV causing Middle East Respiratory Syndrome, SARS-CoV causing SARS, and SARS-CoV-2 causing COVID-19, use a mechanism known as -1 programmed ribosomal frameshifting (-1 PRF) to replicate. SARS-CoV-2 possesses a unique RNA pseudoknotted structure that stimulates -1 PRF. Targeting -1 PRF in SARS-CoV-2 to impair viral replication can improve patients' prognoses. Crucial to developing these therapies is understanding the structure of the SARS-CoV-2 -1 PRF pseudoknot. Our goal is to expand knowledge of -1 PRF structural conformations. Following a structural alignment approach, we identify similarities in -1 PRF pseudoknots of SARS-CoV-2, SARS-CoV, and MERS-CoV. We provide in-depth analysis of the SARS-CoV-2 and MERS-CoV -1 PRF pseudoknots, including reference and noteworthy mutated sequences. To better understand the impact of mutations, we provide insight on -1 PRF pseudoknot sequence mutations and their effect on resulting structures. We introduce Shapify, a novel algorithm that given an RNA sequence incorporates structural reactivity (SHAPE) data and partial structure information to output an RNA secondary structure prediction within a biologically sound hierarchical folding approach. Shapify enhances our understanding of SARS-CoV-2 -1 PRF pseudoknot conformations by providing energetically favourable predictions that are relevant to structure-function and may correlate with -1 PRF efficiency. Applied to the SARS-CoV-2 -1 PRF pseudoknot, Shapify unveils previously unknown paths from initial stems to pseudoknotted structures. By contextualizing our work with available experimental data, our structure predictions motivate future RNA structure-function research and can aid 3-D modeling of pseudoknots.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , RNA, Viral/genetics , Molecular Conformation , Nucleic Acid Conformation
6.
Curr Protoc ; 3(2): e661, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36779804

ABSTRACT

RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.


Subject(s)
Algorithms , RNA , Biotechnology , Nucleic Acid Conformation , RNA/genetics , RNA/chemistry , Sequence Analysis, RNA/methods
7.
BMC Bioinformatics ; 23(1): 159, 2022 May 03.
Article in English | MEDLINE | ID: mdl-35505276

ABSTRACT

BACKGROUND: Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS: We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS: We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.


Subject(s)
Algorithms , Software , Humans , Nucleic Acid Conformation , RNA/chemistry , Sequence Analysis, RNA/methods
8.
BMC Bioinformatics ; 23(1): 118, 2022 Apr 02.
Article in English | MEDLINE | ID: mdl-35366794

ABSTRACT

MOTIVATION: Deep learning has become a prevalent method in identifying genomic regulatory sequences such as promoters. In a number of recent papers, the performance of deep learning models has continually been reported as an improvement over alternatives for sequence-based promoter recognition. However, the performance improvements in these models do not account for the different datasets that models are evaluated on. The lack of a consensus dataset and procedure for benchmarking purposes has made the comparison of each model's true performance difficult to assess. RESULTS: We present a framework called Supervised Promoter Recognition Framework ('SUPR REF') capable of streamlining the complete process of training, validating, testing, and comparing promoter recognition models in a systematic manner. SUPR REF includes the creation of biologically relevant benchmark datasets to be used in the evaluation process of deep learning promoter recognition models. We showcase this framework by comparing the models' performances on alternative datasets, and properly evaluate previously published models on new benchmark datasets. Our results show that the reliability of deep learning ab initio promoter recognition models on eukaryotic genomic sequences is still not at a sufficient level, as overall performance is still low. These results originate from a subset of promoters, the well-known RNA Polymerase II core promoters. Furthermore, given the observational nature of these data, cross-validation results from small promoter datasets need to be interpreted with caution.


Subject(s)
Benchmarking , Genomics , Eukaryotic Cells , Promoter Regions, Genetic , Reproducibility of Results
9.
Nat Comput Sci ; 2(6): 356-357, 2022 Jun.
Article in English | MEDLINE | ID: mdl-38177575
10.
J Clin Med ; 10(18)2021 Sep 15.
Article in English | MEDLINE | ID: mdl-34575270

ABSTRACT

Despite a major interest in understanding how the endothelial cell phenotype is established, the underlying molecular basis of this process is not yet fully understood. We have previously reported the generation of induced pluripotent stem cells (iPS) from human umbilical vein endothelial cells and differentiation of the resulting HiPS back to endothelial cells (Ec-Diff), as well as neural (Nn-Diff) cell lineage that contained both neurons and astrocytes. Furthermore, the identities of these cell lineages were established by gene array analysis. Here, we explored the same arrays to gain insight into the gene alteration processes that accompany the establishment of endothelial vs. non-endothelial neural cell phenotypes. We compared the expression of genes that code for transcription factors and epigenetic regulators when HiPS is differentiated into these endothelial and non-endothelial lineages. Our in silico analyses have identified cohorts of genes that are similarly up- or downregulated in both lineages, as well as those that exhibit lineage-specific alterations. Based on these results, we propose that genes that are similarly altered in both lineages participate in priming the stem cell for differentiation in a lineage-independent manner, whereas those that are differentially altered in endothelial compared to neural cells participate in a lineage-specific differentiation process. Specific GATA family members and their cofactors and epigenetic regulators (DNMT3B, PRDM14, HELLS) with a major role in regulating DNA methylation were among participants in priming HiPS for lineage-independent differentiation. In addition, we identified distinct cohorts of transcription factors and epigenetic regulators whose alterations correlated specifically with the establishment of endothelial vs. non-endothelial neural lineages.

11.
Stem Cells ; 37(4): 542-554, 2019 04.
Article in English | MEDLINE | ID: mdl-30682218

ABSTRACT

Endothelial cells play a central role in physiological function and pathophysiology of blood vessels in health and disease. However, the molecular mechanism that establishes the endothelial phenotype, and contributes to its signature cell type-specific gene expression, is not yet understood. We studied the regulation of a highly endothelial-specific gene, von Willebrand factor (VWF), in induced pluripotent stem cells generated from primary endothelial cells (human umbilical vein endothelial cells [HUVEC] into a pluripotent state [HiPS]) and subsequently differentiated back into endothelial cells. This allowed us to explore how VWF expression is regulated when the endothelial phenotype is revoked (endothelial cells to HiPS), and re-established (HiPS back to endothelial cells [EC-Diff]). HiPS were generated from HUVECs, their pluripotency established, and then differentiated back to endothelial cells. We established phenotypic characteristics and robust angiogenic function of EC-Diff. Gene array analyses, VWF chromatin modifications, and transacting factors binding assays were performed on the three cell types (HUVEC, HiPS, and EC-Diff). The results demonstrated that generally cohorts of transacting factors that function as transcriptional activators, and those that contribute to histone acetylation and DNA demethylation, were significantly decreased in HiPS compared with HUVECs and EC-Diff. In contrast, there were significant increases in the gene expression levels of epigenetic modifiers that function as methyl transferases in HiPS compared with endothelial cells. The results demonstrated that alterations in chromatin modifications of the VWF gene, in addition to expression and binding of transacting factors that specifically function as activators, are responsible for establishing endothelial specific regulation of the VWF gene. Stem Cells 2019;37:542-554.


Subject(s)
Endothelial Cells/metabolism , Gene Expression/genetics , Induced Pluripotent Stem Cells/metabolism , von Willebrand Factor/genetics , Cell Differentiation , Humans
12.
Bioinformatics ; 34(22): 3849-3856, 2018 11 15.
Article in English | MEDLINE | ID: mdl-29868872

ABSTRACT

Motivation: The computational prediction of RNA secondary structure by free energy minimization has become an important tool in RNA research. However in practice, energy minimization is mostly limited to pseudoknot-free structures or rather simple pseudoknots, not covering many biologically important structures such as kissing hairpins. Algorithms capable of predicting sufficiently complex pseudoknots (for sequences of length n) used to have extreme complexities, e.g. Pknots has O(n6) time and O(n4) space complexity. The algorithm CCJ dramatically improves the asymptotic run time for predicting complex pseudoknots (handling almost all relevant pseudoknots, while being slightly less general than Pknots), but this came at the cost of large constant factors in space and time, which strongly limited its practical application (∼200 bases already require 256 GB space). Results: We present a CCJ-type algorithm, Knotty, that handles the same comprehensive pseudoknot class of structures as CCJ with improved space complexity of Θ(n3+Z)-due to the applied technique of sparsification, the number of 'candidates', Z, appears to grow significantly slower than n4 on our benchmark set (which include pseudoknotted RNAs up to 400 nt). In terms of run time over this benchmark, Knotty clearly outperforms Pknots and the original CCJ implementation, CCJ 1.0; Knotty's space consumption fundamentally improves over CCJ 1.0, being on a par with the space-economic Pknots. By comparing to CCJ 2.0, our unsparsified Knotty variant, we demonstrate the isolated effect of sparsification. Moreover, Knotty employs the state-of-the-art energy model of 'HotKnots DP09', which results in superior prediction accuracy over Pknots. Availability and implementation: Our software is available at https://github.com/HosnaJabbari/Knotty. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
RNA/chemistry , Software , Algorithms , Nucleic Acid Conformation , Sequence Analysis, RNA
13.
PLoS One ; 13(4): e0194583, 2018.
Article in English | MEDLINE | ID: mdl-29621250

ABSTRACT

MOTIVATION: RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. RESULTS: The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.


Subject(s)
Models, Molecular , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Databases, Genetic , Mutation , RNA/genetics , RNA, Bacterial , RNA, Ribosomal, 5S/chemistry , Reproducibility of Results , Software
14.
Algorithms Mol Biol ; 11: 7, 2016.
Article in English | MEDLINE | ID: mdl-27110275

ABSTRACT

BACKGROUND: RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. RESULTS: Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). CONCLUSIONS: The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.uni-leipzig.de/~will/Software/SparseMFEFold.

15.
ACS Comb Sci ; 17(10): 535-47, 2015 Oct 12.
Article in English | MEDLINE | ID: mdl-26348196

ABSTRACT

Recent advances in experimental DNA origami have dramatically expanded the horizon of DNA nanotechnology. Complex 3D suprastructures have been designed and developed using DNA origami with applications in biomaterial science, nanomedicine, nanorobotics, and molecular computation. Ribonucleic acid (RNA) origami has recently been realized as a new approach. Similar to DNA, RNA molecules can be designed to form complex 3D structures through complementary base pairings. RNA origami structures are, however, more compact and more thermodynamically stable due to RNA's non-canonical base pairing and tertiary interactions. With all these advantages, the development of RNA origami lags behind DNA origami by a large gap. Furthermore, although computational methods have proven to be effective in designing DNA and RNA origami structures and in their evaluation, advances in computational nucleic acid origami is even more limited. In this paper, we review major milestones in experimental and computational DNA and RNA origami and present current challenges in these fields. We believe collaboration between experimental nanotechnologists and computer scientists are critical for advancing these new research paradigms.


Subject(s)
Computational Biology , Nanotechnology/methods , Nucleic Acids/chemistry , Base Pairing , DNA/chemistry , Nanostructures , Nucleic Acids/chemical synthesis , RNA/chemistry
16.
BMC Bioinformatics ; 15: 147, 2014 May 18.
Article in English | MEDLINE | ID: mdl-24884954

ABSTRACT

BACKGROUND: Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. RESULTS: We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0.Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure.Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. CONCLUSIONS: Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Nucleic Acid Conformation , RNA/chemistry , Sequence Analysis, RNA/methods , Base Sequence , Humans , RNA/genetics , Software , Time Factors
17.
J Comput Biol ; 16(6): 803-15, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19522664

ABSTRACT

Efficient methods for prediction of minimum free energy (MFE) nucleic secondary structures are widely used, both to better understand structure and function of biological RNAs and to design novel nano-structures. Here, we present a new algorithm for MFE secondary structure prediction, which significantly expands the class of structures that can be handled in O(n(5)) time. Our algorithm can handle H-type pseudoknotted structures, kissing hairpins, and chains of four overlapping stems, as well as nested substructures of these types.


Subject(s)
Algorithms , Computational Biology/methods , Nucleic Acid Conformation , Nucleic Acids/chemistry , Aptamers, Nucleotide/chemistry , Thermodynamics
18.
J Comput Biol ; 15(2): 139-63, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18312147

ABSTRACT

Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.


Subject(s)
Algorithms , Nucleic Acid Conformation , RNA/chemistry , Sequence Analysis, RNA , Base Pairing , Base Sequence , Computational Biology , Mathematics , Molecular Sequence Data , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...