Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 51(14): 7409-7423, 2023 08 11.
Article in English | MEDLINE | ID: mdl-37293966

ABSTRACT

Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth.


Subject(s)
DNA , Genome , Mutation Rate , DNA/chemistry , DNA/genetics , Genomics , Humans , Animals , Eukaryota/genetics , Prokaryotic Cells/chemistry
2.
Bioinformatics ; 39(5)2023 05 04.
Article in English | MEDLINE | ID: mdl-37140540

ABSTRACT

MOTIVATION: Various computational biology calculations require a probabilistic optimization protocol to determine the parameters that capture the system at a desired state in the configurational space. Many existing methods excel at certain scenarios, but fail in others due, in part, to an inefficient exploration of the parameter space and easy trapping into local minima. Here, we developed a general-purpose optimization engine in R that can be plugged to any, simple or complex, modelling initiative through a few lucid interfacing functions, to perform a seamless optimization with rigorous parameter sampling. RESULTS: ROptimus features simulated annealing and replica exchange implementations equipped with adaptive thermoregulation to drive Monte Carlo optimization process in a flexible manner, through constrained acceptance frequency but unconstrained adaptive pseudo temperature regimens. We exemplify the applicability of our R optimizer to a diverse set of problems spanning data analyses and computational biology tasks. AVAILABILITY AND IMPLEMENTATION: ROptimus is written and implemented in R, and is freely available from CRAN (http://cran.r-project.org/web/packages/ROptimus/index.html) and GitHub (http://github.com/SahakyanLab/ROptimus).


Subject(s)
Computational Biology , Software , Computational Biology/methods , Monte Carlo Method , Temperature
3.
Angew Chem Int Ed Engl ; 60(18): 10286-10294, 2021 04 26.
Article in English | MEDLINE | ID: mdl-33605024

ABSTRACT

Recent studies indicate that i-DNA, a four-stranded cytosine-rich DNA also known as the i-motif, is actually formed in vivo; however, a systematic study on sequence effects on stability has been missing. Herein, an unprecedented number of different sequences (271) bearing four runs of 3-6 cytosines with different spacer lengths has been tested. While i-DNA stability is nearly independent on total spacer length, the central spacer plays a special role on stability. Stability also depends on the length of the C-tracts at both acidic and neutral pHs. This study provides a global picture on i-DNA stability thanks to the large size of the introduced data set; it reveals unexpected features and allows to conclude that determinants of i-DNA stability do not mirror those of G-quadruplexes. Our results illustrate the structural roles of loops and C-tracts on i-DNA stability, confirm its formation in cells, and allow establishing rules to predict its stability.

4.
Chembiochem ; 21(3): 320-323, 2020 02 03.
Article in English | MEDLINE | ID: mdl-31386787

ABSTRACT

The alphabet of modified DNA bases goes beyond the conventional four letters, with biological roles being found for many such modifications. Herein, we describe the observation of a modified thymine base that arises from spontaneous N1 -C2 ring opening of the oxidation product 5-formyl uracil, after N3 deprotonation. We first observed this phenomenon in silico through ab initio calculations, followed by in vitro experiments to verify its formation at a mononucleoside level and in a synthetic DNA oligonucleotide context. We show that the new base modification (Trex , thymine ring expunged) can form under physiological conditions, and is resistant to the action of common repair machineries. Furthermore, we found cases of the natural existence of Trex while screening a number of human cell types and mESC (E14), thus suggesting potential biological relevance of this modification.


Subject(s)
DNA/metabolism , Thymine/metabolism , Cell Line, Tumor , DNA/genetics , HeLa Cells , Humans , Molecular Structure , Oxidation-Reduction , Thymine/chemistry
5.
Nucleic Acids Res ; 47(8): 3862-3874, 2019 05 07.
Article in English | MEDLINE | ID: mdl-30892612

ABSTRACT

Genomic maps of DNA G-quadruplexes (G4s) can help elucidate the roles that these secondary structures play in various organisms. Herein, we employ an improved version of a G-quadruplex sequencing method (G4-seq) to generate whole genome G4 maps for 12 species that include widely studied model organisms and also pathogens of clinical relevance. We identify G4 structures that form under physiological K+ conditions and also G4s that are stabilized by the G4-targeting small molecule pyridostatin (PDS). We discuss the various structural features of the experimentally observed G-quadruplexes (OQs), highlighting differences in their prevalence and enrichment across species. Our study describes diversity in sequence composition and genomic location for the OQs in the different species and reveals that the enrichment of OQs in gene promoters is particular to mammals such as mouse and human, among the species studied. The multi-species maps have been made publicly available as a resource to the research community. The maps can serve as blueprints for biological experiments in those model organisms, where G4 structures may play a role.


Subject(s)
Chromosome Mapping/methods , G-Quadruplexes , Genome , Aminoquinolines/chemistry , Animals , Arabidopsis/classification , Arabidopsis/genetics , Base Sequence , Caenorhabditis elegans , Drosophila melanogaster/classification , Drosophila melanogaster/genetics , Escherichia coli/classification , Escherichia coli/genetics , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Leishmania major/classification , Leishmania major/genetics , Mice , Phylogeny , Picolinic Acids/chemistry , Plasmodium falciparum/classification , Plasmodium falciparum/genetics , Rhodobacter sphaeroides/classification , Rhodobacter sphaeroides/genetics , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae/genetics , Trypanosoma brucei brucei/classification , Trypanosoma brucei brucei/genetics , Zebrafish/classification , Zebrafish/genetics
6.
Chem Commun (Camb) ; 54(77): 10878-10881, 2018 Sep 25.
Article in English | MEDLINE | ID: mdl-30204160

ABSTRACT

Here we identify hundreds of RNA G-quadruplex (rG4) candidates in microRNAs (miRNAs), characterize the miRNA structure and miRNA-mRNA interactions on several mammalian-conserved miRNAs, and reveal the formation of rG4s in miRNAs. Notably, we study the effect of these rG4s in cells and uncover the role of rG4s in miRNA-mediated post-transcriptional regulation.


Subject(s)
G-Quadruplexes , MicroRNAs/chemistry , HEK293 Cells , Humans , MicroRNAs/metabolism
7.
Sci Rep ; 7(1): 14535, 2017 11 06.
Article in English | MEDLINE | ID: mdl-29109402

ABSTRACT

We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.


Subject(s)
G-Quadruplexes , Machine Learning , Base Sequence , Computer Simulation , Genome, Human/genetics , Humans
8.
Nat Struct Mol Biol ; 24(3): 243-247, 2017 03.
Article in English | MEDLINE | ID: mdl-28134931

ABSTRACT

Long interspersed nuclear elements (LINEs) are ubiquitous transposable elements in higher eukaryotes that have a significant role in shaping genomes, owing to their abundance. Here we report that guanine-rich sequences in the 3' untranslated regions (UTRs) of hominoid-specific LINE-1 elements are coupled with retrotransposon speciation and contribute to retrotransposition through the formation of G-quadruplex (G4) structures. We demonstrate that stabilization of the G4 motif of a human-specific LINE-1 element by small-molecule ligands stimulates retrotransposition.


Subject(s)
3' Untranslated Regions/genetics , G-Quadruplexes , Long Interspersed Nucleotide Elements/genetics , Retroelements/genetics , Base Sequence , HeLa Cells , Humans , Ligands , Mutation/genetics , Nucleotide Motifs/genetics
9.
BMC Genomics ; 18(1): 81, 2017 01 13.
Article in English | MEDLINE | ID: mdl-28086752

ABSTRACT

BACKGROUND: Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS: The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS: The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.


Subject(s)
Genome, Human , Germ-Line Mutation , Mutation Rate , Base Composition , Chromosome Mapping , Computational Biology/methods , Genomics/methods , Humans , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
10.
Nat Methods ; 13(10): 841-4, 2016 10.
Article in English | MEDLINE | ID: mdl-27571552

ABSTRACT

We introduce RNA G-quadruplex sequencing (rG4-seq), a transcriptome-wide RNA G-quadruplex (rG4) profiling method that couples rG4-mediated reverse transcriptase stalling with next-generation sequencing. Using rG4-seq on polyadenylated-enriched HeLa RNA, we generated a global in vitro map of thousands of canonical and noncanonical rG4 structures. We characterize rG4 formation relative to cytosine content and alternative RNA structure stability, uncover rG4-dependent differences in RNA folding and show evolutionarily conserved enrichment in transcripts mediating RNA processing and stability.


Subject(s)
G-Quadruplexes , High-Throughput Nucleotide Sequencing/methods , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Transcriptome/genetics , Cytosine/metabolism , Guanine/metabolism , HeLa Cells , Humans , RNA Stability , RNA, Messenger/metabolism
11.
Angew Chem Int Ed Engl ; 55(31): 8958-61, 2016 07 25.
Article in English | MEDLINE | ID: mdl-27355429

ABSTRACT

RNA G-quadruplex (rG4) structures are of fundamental importance to biology. A novel approach is introduced to detect and structurally map rG4s at single-nucleotide resolution in RNAs. The approach, denoted SHALiPE, couples selective 2'-hydroxyl acylation with lithium ion-based primer extension, and identifies characteristic structural fingerprints for rG4 mapping. We apply SHALiPE to interrogate the human precursor microRNA 149, and reveal the formation of an rG4 structure in this non-coding RNA. Additional analyses support the SHALiPE results and uncover that this rG4 has a parallel topology, is thermally stable, and is conserved in mammals. An in vitro Dicer assay shows that this rG4 inhibits Dicer processing, supporting the potential role of rG4 structures in microRNA maturation and post-transcriptional regulation of mRNAs.


Subject(s)
G-Quadruplexes , Hydroxides/chemistry , MicroRNAs/analysis , Acylation , Humans , Molecular Structure
12.
BMC Genomics ; 17: 225, 2016 Mar 12.
Article in English | MEDLINE | ID: mdl-26968808

ABSTRACT

BACKGROUND: The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS: We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS: Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.


Subject(s)
Alternative Splicing , Cardiomyopathies/genetics , Diabetes Mellitus, Type 2/genetics , Neoplasms/genetics , Exons , Humans , Introns , Mutation
13.
J Am Chem Soc ; 137(29): 9270-2, 2015 Jul 29.
Article in English | MEDLINE | ID: mdl-25946119

ABSTRACT

We present a chemical method to selectively tag and enrich thymine modifications, 5-formyluracil (5-fU) and 5-hydroxymethyluracil (5-hmU), found naturally in DNA. Inherent reactivity differences have enabled us to tag 5-fU chemoselectively over its C modification counterpart, 5-formylcytosine (5-fC). We rationalized the enhanced reactivity of 5-fU compared to 5-fC via ab initio quantum mechanical calculations. We exploited this chemical tagging reaction to provide proof of concept for the enrichment of 5-fU containing DNA from a pool that contains 5-fC or no modification. We further demonstrate that 5-hmU can be chemically oxidized to 5-fU, providing a strategy for the enrichment of 5-hmU. These methods will enable the mapping of 5-fU and 5-hmU in genomic DNA, to provide insights into their functional role and dynamics in biology.


Subject(s)
DNA/chemistry , Thymine/chemistry , Base Sequence , DNA/genetics , Models, Molecular , Nucleic Acid Conformation , Oligodeoxyribonucleotides/chemistry , Oligodeoxyribonucleotides/genetics , Pentoxyl/analogs & derivatives , Pentoxyl/chemistry , Uracil/analogs & derivatives , Uracil/chemistry
14.
Proc Natl Acad Sci U S A ; 111(28): 10203-8, 2014 Jul 15.
Article in English | MEDLINE | ID: mdl-24982184

ABSTRACT

Proline isomerization is a ubiquitous process that plays a key role in the folding of proteins and in the regulation of their functions. Different families of enzymes, known as "peptidyl-prolyl isomerases" (PPIases), catalyze this reaction, which involves the interconversion between the cis and trans isomers of the N-terminal amide bond of the amino acid proline. However, complete descriptions of the mechanisms by which these enzymes function have remained elusive. We show here that cyclophilin A, one of the most common PPIases, provides a catalytic environment that acts on the substrate through an electrostatic handle mechanism. In this mechanism, the electrostatic field in the catalytic site turns the electric dipole associated with the carbonyl group of the amino acid preceding the proline in the substrate, thus causing the rotation of the peptide bond between the two residues. We identified this mechanism using a combination of NMR measurements, molecular dynamics simulations, and density functional theory calculations to simultaneously determine the cis-bound and trans-bound conformations of cyclophilin A and its substrate as the enzymatic reaction takes place. We anticipate that this approach will be helpful in elucidating whether the electrostatic handle mechanism that we describe here is common to other PPIases and, more generally, in characterizing other enzymatic processes.


Subject(s)
Cyclophilin A/chemistry , Molecular Dynamics Simulation , Proline/chemistry , Catalysis , Humans , Nuclear Magnetic Resonance, Biomolecular , Static Electricity
15.
J Comput Chem ; 35(14): 1101-5, 2014 May 30.
Article in English | MEDLINE | ID: mdl-24676684

ABSTRACT

Almost (all atom molecular simulation toolkit) is an open source computational package for structure determination and analysis of complex molecular systems including proteins, and nucleic acids. Almost has been designed with two primary goals: to provide tools for molecular structure determination using various types of experimental measurements as conformational restraints, and to provide methods for the analysis and assessment of structural and dynamical properties of complex molecular systems. The methods incorporated in Almost include the determination of structural and dynamical features of proteins using distance restraints derived from nuclear Overhauser effect measurements, orientational restraints obtained from residual dipolar couplings and the structural restraints from chemical shifts. Here, we present the first public release of Almost, highlight the key aspects of its computational design and discuss the main features currently implemented. Almost is available for the most common Unix-based operating systems, including Linux and Mac OS X. Almost is distributed free of charge under the GNU Public License, and is available both as a source code and as a binary executable from the project web site at http://www.open-almost.org. Interested users can follow and contribute to the further development of Almost on http://sourceforge.net/projects/almost.


Subject(s)
Molecular Dynamics Simulation , Proteins/chemistry , Software , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation
16.
J Am Chem Soc ; 136(6): 2204-7, 2014 Feb 12.
Article in English | MEDLINE | ID: mdl-24517490

ABSTRACT

Recent improvements in the accuracy of structure-based methods for the prediction of nuclear magnetic resonance chemical shifts have inspired numerous approaches for determining the secondary and tertiary structures of proteins. Such advances also suggest the possibility of using chemical shifts to characterize the conformational fluctuations of these molecules. Here we describe a method of using methyl chemical shifts as restraints in replica-averaged molecular dynamics (MD) simulations, which enables us to determine the conformational ensemble of the HU dimer and characterize the range of motions accessible to its flexible ß-arms. Our analysis suggests that the bending action of HU on DNA is mediated by a mechanical clamping mechanism, in which metastable structural intermediates sampled during the hinge motions of the ß-arms in the free state are presculpted to bind DNA. These results illustrate that using side-chain chemical shift data in conjunction with MD simulations can provide quantitative information about the free energy landscapes of proteins and yield detailed insights into their functional mechanisms.


Subject(s)
Bacterial Proteins/chemistry , DNA-Binding Proteins/chemistry , DNA/chemistry , Magnetic Resonance Spectroscopy , Molecular Dynamics Simulation , Binding Sites , Dimerization , Methane/chemistry , Molecular Conformation
17.
J Chem Phys ; 139(3): 034101, 2013 Jul 21.
Article in English | MEDLINE | ID: mdl-23883004

ABSTRACT

It has been recently shown that NMR chemical shifts can be used to determine the structures of proteins. In order to begin to extend this type of approach to nucleic acids, we present an equation that relates the structural parameters and the (13)C chemical shifts of the ribose group. The parameters in the equation were determined by maximizing the agreement between the DFT-derived chemical shifts and those predicted through the equation for a database of ribose structures. Our results indicate that this type of approach represents a promising way of establishing quantitative and computationally efficient analytical relationships between chemical shifts and structural parameters in nucleic acids.


Subject(s)
Quantum Theory , RNA/chemistry , Ribose/chemistry , Nucleosides/chemistry , Nucleotides/chemistry
18.
J Phys Chem B ; 117(7): 1989-98, 2013 Feb 21.
Article in English | MEDLINE | ID: mdl-23398371

ABSTRACT

Ring current and electric field effects can considerably influence NMR chemical shifts in biomolecules. Understanding such effects is particularly important for the development of accurate mappings between chemical shifts and the structures of nucleic acids. In this work, we first analyzed the Pople and the Haigh-Mallion models in terms of their ability to describe nitrogen base conjugated ring effects. We then created a database (DiBaseRNA) of three-dimensional arrangements of RNA base pairs from X-ray structures, calculated the corresponding chemical shifts via a hybrid density functional theory approach and used the results to parametrize the ring current and electric field effects in RNA bases. Next, we studied the coupling of the electric field and ring current effects for different inter-ring arrangements found in RNA bases using linear model fitting, with joint electric field and ring current, as well as only electric field and only ring current approximations. Taken together, our results provide a characterization of the interdependence of ring current and electric field geometric factors, which is shown to be especially important for the chemical shifts of non-hydrogen atoms in RNA bases.


Subject(s)
Electricity , RNA/chemistry , Crystallography, X-Ray , Databases, Factual , Hydrogen Bonding , Magnetic Resonance Spectroscopy , Models, Molecular , Nitrogen/chemistry , Nucleic Acid Conformation , RNA/metabolism , Software
19.
J Phys Chem B ; 116(16): 4754-9, 2012 Apr 26.
Article in English | MEDLINE | ID: mdl-22455760

ABSTRACT

We present a method of assessing the quality of protein structures based on the use of side-chain NMR chemical shifts. Because these parameters are very accurate reporters of side-chain positions and are highly sensitive to tertiary structure and packing, they are particularly useful for structure validation. To analyze a given structure, we define a quality score, QCS, that compares the chemical shifts calculated from such a structure with the corresponding experimental values in a way that takes account of the errors in the calculations. The results that we report illustrate the advantages in the examination of the quality of protein structures from the perspective of side-chains.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular , Proteins/chemistry , Models, Molecular , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL
...