Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
RNA ; 30(1): 1-15, 2023 Dec 18.
Article in English | MEDLINE | ID: mdl-37903545

ABSTRACT

We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of coevolving sites in the alignment (motifs), identified via coevolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the coevolutionary coupling of pairs and manifest as coevolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness, and robustness. Lastly, we study how well each lineage is classified by such a system.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Pandemics , Retrospective Studies , Genomics
2.
J Comput Biol ; 28(3): 248-256, 2021 03.
Article in English | MEDLINE | ID: mdl-33275493

ABSTRACT

COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The viral genome is considered to be relatively stable and the mutations that have been observed and reported thus far are mainly focused on the coding region. This article provides evidence that macrolevel pandemic dynamics, such as social distancing, modulate the genomic evolution of SARS-CoV-2. This view complements the prevalent paradigm that microlevel observables control macrolevel parameters such as death rates and infection patterns. First, we observe differences in mutational signals for geospatially separated populations such as the prevalence of A23404G in CA versus NY and WA. We show that the feedback between macrolevel dynamics and the viral population can be captured employing a transfer entropy framework. Second, we observe complex interactions within mutational clades. Namely, when C14408T first appeared in the viral population, the frequency of A23404G spiked in the subsequent week. Third, we identify a noncoding mutation, G29540A, within the segment between the coding gene of the N protein and the ORF10 gene, which is largely confined to NY (>95%). These observations indicate that macrolevel sociobehavioral measures have an impact on the viral genomics and may be useful for the dashboard-like tracking of its evolution. Finally, despite the fact that SARS-CoV-2 is a genetically robust organism, our findings suggest that we are dealing with a high degree of adaptability. Owing to its ample spread, mutations of unusual form are observed and a high complexity of mutational interaction is exhibited.


Subject(s)
COVID-19/virology , Evolution, Molecular , Genome, Viral , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/transmission , Computational Biology , Gene Frequency , Health Behavior , Health Policy , Humans , Models, Genetic , Mutation , Pandemics , Phylogeny , Physical Distancing , SARS-CoV-2/pathogenicity , SARS-CoV-2/physiology , Spike Glycoprotein, Coronavirus/genetics
3.
Algorithms Mol Biol ; 15: 15, 2020.
Article in English | MEDLINE | ID: mdl-32782456

ABSTRACT

Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than 90 % .

4.
J Math Biol ; 79(3): 791-822, 2019 08.
Article in English | MEDLINE | ID: mdl-31172257

ABSTRACT

In this paper we analyze the length-spectrum of blocks in [Formula: see text]-structures. [Formula: see text]-structures are a class of RNA pseudoknot structures that play a key role in the context of polynomial time RNA folding. A [Formula: see text]-structure is constructed by nesting and concatenating specific building components having topological genus at most [Formula: see text]. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated [Formula: see text]-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least [Formula: see text] and that with high probability any other block has finite length. For fixed [Formula: see text], we prove that the length of the complement of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.


Subject(s)
Algorithms , RNA Folding , RNA/chemistry , Humans , Models, Molecular
5.
Bull Math Biol ; 80(6): 1514-1538, 2018 06.
Article in English | MEDLINE | ID: mdl-29541998

ABSTRACT

In this paper, we analyze the length spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least [Formula: see text] and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite k, the distribution of rainbows of length k becomes for large n a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.


Subject(s)
Models, Molecular , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Base Sequence , Binomial Distribution , Mathematical Concepts , RNA Folding , Thermodynamics
6.
J Math Biol ; 74(7): 1793-1821, 2017 06.
Article in English | MEDLINE | ID: mdl-27853818

ABSTRACT

In this paper we study properties of topological RNA structures, i.e. RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as 3-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.


Subject(s)
Models, Molecular , RNA/chemistry , Algorithms , Nucleic Acid Conformation
7.
J Comput Biol ; 23(11): 857-873, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27322662

ABSTRACT

Given a random RNA secondary structure, S, we study RNA sequences having fixed ratios of nucleotides that are compatible with S. We perform this analysis for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of nucleotide ratios, there exists a convex region, in which, in the limit of long sequences, a random structure asymptotically almost surely (a.a.s.) has compatible sequence with these ratios and outside of which a.a.s. a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. In particular, for GC-sequences (GC denoting the nucleotides guanine and cytosine, respectively) having a ratio of G nucleotides smaller than 1/3, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s. no such compatible sequence. For sequences having a ratio of G nucleotides larger than 1/3, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.


Subject(s)
RNA/chemistry , RNA/genetics , Algorithms , Base Composition , Base Pairing , Base Sequence , Models, Molecular , Nucleic Acid Conformation
8.
J Comput Biol ; 21(8): 591-608, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24689708

ABSTRACT

In this article we study canonical γ-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed of specific building blocks that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of γ-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus ≤ γ. The γ-structures are constructed via γ-matchings. For 1 ≤ γ ≤ 10, we compute Puiseux expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of γ-structures.


Subject(s)
RNA/chemistry , Algorithms , Nucleic Acid Conformation
9.
Math Biosci ; 241(1): 24-33, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23022027

ABSTRACT

In this paper we study γ-structures filtered by topological genus. γ-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed by specific building blocks, that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for γ-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in γ-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Algorithms , Mathematical Concepts , Models, Molecular , RNA Folding
10.
J Math Biol ; 64(3): 529-56, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21541694

ABSTRACT

RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, and that there is no so-called "zigzag" configuration. This paper presents the combinatorics of RNA interaction structures including their generating function, singularity analysis as well as explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the number of joint structures.


Subject(s)
Combinatorial Chemistry Techniques , Models, Molecular , RNA Folding , RNA/chemistry , Algorithms , Base Sequence , Molecular Sequence Data
11.
Math Biosci ; 233(1): 47-58, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21689666

ABSTRACT

Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zigzag" configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Algorithms , Binding Sites , Mathematical Concepts , Models, Molecular , RNA/genetics , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...