Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
Add more filters










Publication year range
1.
RNA ; 30(1): 1-15, 2023 Dec 18.
Article in English | MEDLINE | ID: mdl-37903545

ABSTRACT

We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of coevolving sites in the alignment (motifs), identified via coevolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the coevolutionary coupling of pairs and manifest as coevolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness, and robustness. Lastly, we study how well each lineage is classified by such a system.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Pandemics , Retrospective Studies , Genomics
2.
Chaos ; 33(2): 023118, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36859214

ABSTRACT

Mathematical models rooted in network representations are becoming increasingly more common for capturing a broad range of phenomena. Boolean networks (BNs) represent a mathematical abstraction suited for establishing general theory applicable to such systems. A key thread in BN research is developing theory that connects the structure of the network and the local rules to phase space properties or so-called structure-to-function theory. While most theory for BNs has been developed for the synchronous case, the focus of this work is on asynchronously updated BNs (ABNs) which are natural to consider from the point of view of applications to real systems where perfect synchrony is uncommon. A central question in this regard is sensitivity of dynamics of ABNs with respect to perturbations to the asynchronous update scheme. Macauley & Mortveit [Nonlinearity 22, 421-436 (2009)] showed that the periodic orbits are structurally invariant under toric equivalence of the update sequences. In this paper and under the same equivalence of the update scheme, the authors (i) extend that result to the entire phase space, (ii) establish a Lipschitz continuity result for sequences of maximal transient paths, and (iii) establish that within a toric equivalence class the maximal transient length may at most take on two distinct values. In addition, the proofs offer insight into the general asynchronous phase space of Boolean networks.

3.
Bull Math Biol ; 85(3): 21, 2023 02 13.
Article in English | MEDLINE | ID: mdl-36780044

ABSTRACT

The study of native motifs of RNA secondary structures helps us better understand the formation and eventually the functions of these molecules. Commonly known structural motifs include helices, hairpin loops, bulges, interior loops, exterior loops and multiloops. However, enumerative results and generating algorithms taking into account the joint distribution of these motifs are sparse. In this paper, we present progress on deriving such distributions employing a tree-bijection of RNA secondary structures obtained by Schmitt and Waterman and a novel rake decomposition of plane trees. The key feature of the latter is that the derived components encode motifs of the RNA secondary structures without pseudoknots associated with the plane trees very well. As an application, we present an algorithm (RakeSamp) generating uniformly random secondary structures without pseudoknots that satisfy fine motif specifications on the length and degree of various types of loops as well as helices.


Subject(s)
Mathematical Concepts , RNA , RNA/chemistry , Nucleic Acid Conformation , Models, Biological , Algorithms
4.
Algorithms Mol Biol ; 16(1): 7, 2021 Jun 01.
Article in English | MEDLINE | ID: mdl-34074304

ABSTRACT

BACKGROUND: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a given RNA structure, play an important role in the context of neutral evolution. Sequences that are simultaneously compatible with two given structures (bicompatible sequences), are beacons in phenotypic transitions, induced by erroneously replicating populations of RNA sequences. RNA riboswitches, which are capable of expressing two distinct secondary structures without changing the underlying sequence, are one example of bicompatible sequences in living organisms. RESULTS: We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The sequence sampler employs a dynamic programming routine whose time complexity is polynomial when assuming the maximum number of exposed vertices, [Formula: see text], is a constant. The parameter [Formula: see text] depends on the two structures and can be very large. We introduce a novel topological framework encapsulating the relations between loops that sheds light on the understanding of [Formula: see text]. Based on this framework, we give an algorithm to sample sequences with minimum [Formula: see text] on a particular topologically classified case as well as giving hints to the solution in the other cases. As a result, we utilize our sequence sampler to study some established riboswitches. CONCLUSION: Our analysis of riboswitch sequences shows that a pair of structures needs to satisfy key properties in order to facilitate phenotypic transitions and that pairs of random structures are unlikely to do so. Our analysis observes a distinct signature of riboswitch sequences, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. Our free software is available at: https://github.com/FenixHuang667/Bifold .

5.
J Comput Biol ; 28(3): 248-256, 2021 03.
Article in English | MEDLINE | ID: mdl-33275493

ABSTRACT

COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The viral genome is considered to be relatively stable and the mutations that have been observed and reported thus far are mainly focused on the coding region. This article provides evidence that macrolevel pandemic dynamics, such as social distancing, modulate the genomic evolution of SARS-CoV-2. This view complements the prevalent paradigm that microlevel observables control macrolevel parameters such as death rates and infection patterns. First, we observe differences in mutational signals for geospatially separated populations such as the prevalence of A23404G in CA versus NY and WA. We show that the feedback between macrolevel dynamics and the viral population can be captured employing a transfer entropy framework. Second, we observe complex interactions within mutational clades. Namely, when C14408T first appeared in the viral population, the frequency of A23404G spiked in the subsequent week. Third, we identify a noncoding mutation, G29540A, within the segment between the coding gene of the N protein and the ORF10 gene, which is largely confined to NY (>95%). These observations indicate that macrolevel sociobehavioral measures have an impact on the viral genomics and may be useful for the dashboard-like tracking of its evolution. Finally, despite the fact that SARS-CoV-2 is a genetically robust organism, our findings suggest that we are dealing with a high degree of adaptability. Owing to its ample spread, mutations of unusual form are observed and a high complexity of mutational interaction is exhibited.


Subject(s)
COVID-19/virology , Evolution, Molecular , Genome, Viral , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/transmission , Computational Biology , Gene Frequency , Health Behavior , Health Policy , Humans , Models, Genetic , Mutation , Pandemics , Phylogeny , Physical Distancing , SARS-CoV-2/pathogenicity , SARS-CoV-2/physiology , Spike Glycoprotein, Coronavirus/genetics
6.
Algorithms Mol Biol ; 15: 15, 2020.
Article in English | MEDLINE | ID: mdl-32782456

ABSTRACT

Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than 90 % .

7.
RNA ; 25(12): 1592-1603, 2019 12.
Article in English | MEDLINE | ID: mdl-31548338

ABSTRACT

Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of extended mutational robustness (EMR spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations.


Subject(s)
MicroRNAs/genetics , Mutation/genetics , Animals , Evolution, Molecular , Genotype , Humans , Models, Genetic , Nucleic Acid Conformation , Phenotype
8.
J Math Biol ; 79(3): 791-822, 2019 08.
Article in English | MEDLINE | ID: mdl-31172257

ABSTRACT

In this paper we analyze the length-spectrum of blocks in [Formula: see text]-structures. [Formula: see text]-structures are a class of RNA pseudoknot structures that play a key role in the context of polynomial time RNA folding. A [Formula: see text]-structure is constructed by nesting and concatenating specific building components having topological genus at most [Formula: see text]. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated [Formula: see text]-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least [Formula: see text] and that with high probability any other block has finite length. For fixed [Formula: see text], we prove that the length of the complement of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.


Subject(s)
Algorithms , RNA Folding , RNA/chemistry , Humans , Models, Molecular
9.
J Comput Biol ; 26(3): 173-192, 2019 03.
Article in English | MEDLINE | ID: mdl-30653353

ABSTRACT

Recently, a framework considering RNA sequences and their RNA secondary structures as pairs led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. This pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered originally for designing more efficient inverse folding algorithms and subsequently enhanced by facilitating the sampling of sequences. We present here a partition function of sequence/structure pairs, with endowed Hamming distance and base pair distance filtration. This partition function is an augmentation of the previous mentioned (dual) partition function. We develop an efficient dynamic programming routine to recursively compute the partition function with this double filtration. Our framework is capable of dealing with RNA secondary structures as well as 1-structures, where a 1-structure is an RNA pseudoknot structure consisting of "building blocks" of genus 0 or 1. In particular, 0-structures, consisting of only "building blocks" of genus 0, are exactly RNA secondary structures. The time complexity for calculating the partition function of 1-pairs, that is, sequence/structure pairs where the structures are 1-structures, is O(h3b3n6), where h, b, n denote the Hamming distance, base pair distance, and sequence length, respectively. The time complexity for the partition function of 0-pairs is O(h2b2n3).


Subject(s)
Algorithms , RNA Folding , RNA/chemistry , Sequence Analysis, RNA/methods , Molecular Dynamics Simulation , Nucleotide Motifs
10.
J Comput Biol ; 25(11): 1179-1192, 2018 11.
Article in English | MEDLINE | ID: mdl-30133328

ABSTRACT

Recently, a framework considering ribonucleic acid (RNA) sequences and their RNA secondary structures as pairs has led to new information theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms. In this work, we present the dual partition function filtered by Hamming distance, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is [Formula: see text], where [Formula: see text] are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by [Formula: see text]. We then present two applications, the first in the context of the evolution of natural sequence-structure pairs of microRNAs and the second in constructing neutral paths. The former studies the inverse folding rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve toward higher levels of robustness, that is, increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler to construct short paths connecting them, consisting of sequences all contained in the neutral network.


Subject(s)
Algorithms , Computational Biology/methods , RNA/chemistry , Base Sequence , Humans , Models, Molecular , Nucleic Acid Conformation
11.
Bull Math Biol ; 80(6): 1514-1538, 2018 06.
Article in English | MEDLINE | ID: mdl-29541998

ABSTRACT

In this paper, we analyze the length spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least [Formula: see text] and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite k, the distribution of rainbows of length k becomes for large n a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.


Subject(s)
Models, Molecular , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Base Sequence , Binomial Distribution , Mathematical Concepts , RNA Folding , Thermodynamics
12.
Bioinformatics ; 33(3): 382-389, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28171628

ABSTRACT

Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded 'patterns' in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence­structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. Availability and Implementation: The source code is freely available at http://staff.vbi.vt.edu/fenixh/Sampler.zip Contact: duckcr@vbi.vt.edu Supplimentary Information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Nucleic Acid Conformation , RNA/chemistry , Sequence Analysis, RNA/methods , Software , Algorithms , Probability , RNA/metabolism
13.
J Math Biol ; 74(7): 1793-1821, 2017 06.
Article in English | MEDLINE | ID: mdl-27853818

ABSTRACT

In this paper we study properties of topological RNA structures, i.e. RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as 3-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.


Subject(s)
Models, Molecular , RNA/chemistry , Algorithms , Nucleic Acid Conformation
14.
Math Biosci ; 282: 109-120, 2016 12.
Article in English | MEDLINE | ID: mdl-27773681

ABSTRACT

In this paper we introduce a novel, context-free grammar, RNAFeatures*, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. RNAFeatures* acts on formal, arc-labeled RNA secondary structures, called λ-structures. λ-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. RNAFeatures* is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use RNAFeatures* to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an O(nlog (n)) runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB). AVAILABILITY: the source code for simulation results is available at http://staff.vbi.vt.edu/fenixh/TPstructure.zip. The code is written in C and compiled by Xcode.


Subject(s)
Models, Theoretical , RNA/chemistry
15.
J Comput Biol ; 23(11): 857-873, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27322662

ABSTRACT

Given a random RNA secondary structure, S, we study RNA sequences having fixed ratios of nucleotides that are compatible with S. We perform this analysis for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of nucleotide ratios, there exists a convex region, in which, in the limit of long sequences, a random structure asymptotically almost surely (a.a.s.) has compatible sequence with these ratios and outside of which a.a.s. a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. In particular, for GC-sequences (GC denoting the nucleotides guanine and cytosine, respectively) having a ratio of G nucleotides smaller than 1/3, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s. no such compatible sequence. For sequences having a ratio of G nucleotides larger than 1/3, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.


Subject(s)
RNA/chemistry , RNA/genetics , Algorithms , Base Composition , Base Pairing , Base Sequence , Models, Molecular , Nucleic Acid Conformation
16.
Math Biosci ; 270(Pt A): 57-65, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26482318

ABSTRACT

A topological RNA structure is derived by fattening the edges of a contact structure into ribbons. The shape of a topological RNA structure is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one, as well as isolated vertices. A shape contains the key topological information of the molecular conformation and for fixed topological genus there exist only finitely many such shapes. In this paper we compute the generating polynomial of shapes of fixed topological genus g. We furthermore derive an algorithm having O(glog g) time complexity uniformly generating shapes of genus g and discuss some applications in the context of databases of RNA pseudoknot structures.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Algorithms , Databases, Nucleic Acid , Mathematical Concepts , Models, Molecular
17.
Math Biosci ; 262: 88-104, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25640867

ABSTRACT

Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and a linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the complex into a pair of single stranded RNA structures. Our main result is proved bijectively. It provides an explicit algorithm of how to rewire the corresponding complexes and an unambiguous decomposition grammar. Using the concept of genus induction, we construct bicellular maps of fixed topological genus g uniformly in linear time. We present various statistics on these topological RNA complexes and compare our findings with biological complexes. Furthermore we show how to construct loop-energy based complexes using our decomposition grammar.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Algorithms , Mathematical Concepts , Models, Molecular
18.
J Comput Biol ; 21(9): 649-64, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25075750

ABSTRACT

Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape projection preserves the topological core of the RNA complex, and for fixed topological genus there are only finitely many such shapes. Our main result is a new bijection that relates the shapes of RNA complexes with shapes of RNA structures. This allows for computing the shape polynomial of RNA complexes via the shape polynomial of RNA structures. We furthermore present a linear time uniform sampling algorithm for shapes of RNA complexes of fixed topological genus.


Subject(s)
Models, Molecular , RNA/chemistry , Algorithms , Base Pairing , Hydrogen Bonding
19.
J Comput Biol ; 21(8): 591-608, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24689708

ABSTRACT

In this article we study canonical γ-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed of specific building blocks that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of γ-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus ≤ γ. The γ-structures are constructed via γ-matchings. For 1 ≤ γ ≤ 10, we compute Puiseux expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of γ-structures.


Subject(s)
RNA/chemistry , Algorithms , Nucleic Acid Conformation
20.
Math Biosci ; 245(2): 216-25, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23900061

ABSTRACT

In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus g, for arbitrary g>0. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus g that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has O(n(2)) time complexity.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Algorithms , Computational Biology , Mathematical Concepts , Models, Molecular
SELECTION OF CITATIONS
SEARCH DETAIL
...