Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
PLoS One ; 8(1): e54835, 2013.
Article in English | MEDLINE | ID: mdl-23382983

ABSTRACT

Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.


Subject(s)
Algorithms , Computational Biology/methods , Sequence Analysis, DNA , Genotype , HIV-1/genetics , Hepatitis Viruses/classification , Hepatitis Viruses/genetics , Humans , INDEL Mutation , Mycobacterium tuberculosis/classification , Mycobacterium tuberculosis/genetics , Phylogeny , RNA, Ribosomal, 16S
2.
RNA ; 18(1): 1-15, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22128342

ABSTRACT

Pre-mRNA structure impacts many cellular processes, including splicing in genes associated with disease. The contemporary paradigm of RNA structure prediction is biased toward secondary structures that occur within short ranges of pre-mRNA, although long-range base-pairings are known to be at least as important. Recently, we developed an efficient method for detecting conserved RNA structures on the genome-wide scale, one that does not require multiple sequence alignments and works equally well for the detection of local and long-range base-pairings. Using an enhanced method that detects base-pairings at all possible combinations of splice sites within each gene, we now report RNA structures that could be involved in the regulation of splicing in mammals. Statistically, we demonstrate strong association between the occurrence of conserved RNA structures and alternative splicing, where local RNA structures are generally more frequent at alternative donor splice sites, while long-range structures are more associated with weak alternative acceptor splice sites. As an example, we validated the RNA structure in the human SF1 gene using minigenes in the HEK293 cell line. Point mutations that disrupted the base-pairing of two complementary boxes between exons 9 and 10 of this gene altered the splicing pattern, while the compensatory mutations that reestablished the base-pairing reverted splicing to that of the wild-type. There is statistical evidence for a Dscam-like class of mammalian genes, in which mutually exclusive RNA structures control mutually exclusive alternative splicing. In sum, we propose that long-range base-pairings carry an important, yet unconsidered part of the splicing code, and that, even by modest estimates, there must be thousands of such potentially regulatory structures conserved throughout the evolutionary history of mammals.


Subject(s)
Alternative Splicing , RNA Precursors/chemistry , RNA Precursors/genetics , RNA Splicing , Animals , Base Sequence , Conserved Sequence , Doublecortin-Like Kinases , HEK293 Cells , Humans , Intracellular Signaling Peptides and Proteins/genetics , Molecular Sequence Data , Nucleic Acid Conformation , Protein Serine-Threonine Kinases/genetics , RNA Splice Sites , Sequence Analysis, RNA
3.
Nucleic Acids Res ; 37(14): 4533-44, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19465384

ABSTRACT

Accurate and efficient recognition of splice sites during pre-mRNA splicing is essential for proper transcriptome expression. Splice site usage can be modulated by secondary structures, but it is unclear if this type of modulation is commonly used or occurs to a significant degree with secondary structures forming over long distances. Using phlyogenetic comparisons of intronic sequences among 12 Drosophila genomes, we elucidated a group of 202 highly conserved pairs of sequences, each at least nine nucleotides long, capable of forming stable stem structures. This set was highly enriched in alternatively spliced introns and introns with weak acceptor sites and long introns, and most occurred over long distances (>150 nucleotides). Experimentally, we analyzed the splicing of several of these introns using mini-genes in Drosophila S2 cells. Wild-type splicing patterns were changed by mutations that opened the stem structure, and restored by compensatory mutations that re-established the base-pairing potential, demonstrating that these secondary structures were indeed implicated in the splice site choice. Mechanistically, the RNA structures masked splice sites, brought together distant splice sites and/or looped out introns. Thus, base-pairing interactions within introns, even those occurring over long distances, are more frequent modulators of alternative splicing than is currently assumed.


Subject(s)
Alternative Splicing , Drosophila melanogaster/genetics , Introns , RNA Precursors/chemistry , RNA, Messenger/chemistry , Animals , Base Pairing , Base Sequence , Conserved Sequence , Molecular Sequence Data , RNA Splice Sites
4.
Am J Hum Genet ; 83(1): 94-8, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18571144

ABSTRACT

Alternative splicing is a well-recognized mechanism of accelerated genome evolution. We have studied single-nucleotide polymorphisms and human-chimpanzee divergence in the exons of 6672 alternatively spliced human genes, with the aim of understanding the forces driving the evolution of alternatively spliced sequences. Here, we show that alternatively spliced exons and exon fragments (alternative exons) from minor isoforms experience lower selective pressure at the amino acid level, accompanied by selection against synonymous sequence variation. The results of the McDonald-Kreitman test suggest that alternatively spliced exons, unlike exons constitutively included in the mRNA, are also subject to positive selection, with up to 27% of amino acids fixed by positive selection.


Subject(s)
Alternative Splicing/genetics , Exons , Genes/genetics , Selection, Genetic , Amino Acid Sequence , Amino Acid Substitution , Codon , Databases, Factual , Expressed Sequence Tags , Humans , Molecular Sequence Data , Polymorphism, Single Nucleotide , Sequence Homology, Amino Acid
5.
RNA ; 14(4): 717-35, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18359782

ABSTRACT

T-box antitermination is one of the main mechanisms of regulation of genes involved in amino acid metabolism in Gram-positive bacteria. T-box regulatory sites consist of conserved sequence and RNA secondary structure elements. Using a set of known T-box sites, we constructed the common pattern and used it to scan available bacterial genomes. New T-boxes were found in various Gram-positive bacteria, some Gram-negative bacteria (delta-proteobacteria), and some other bacterial groups (Deinococcales/Thermales, Chloroflexi, Dictyoglomi). The majority of T-box-regulated genes encode aminoacyl-tRNA synthetases. Two other groups of T-box-regulated genes are amino acid biosynthetic genes and transporters, as well as genes with unknown function. Analysis of candidate T-box sites resulted in new functional annotations. We assigned the amino acid specificity to a large number of candidate amino acid transporters and a possible function to amino acid biosynthesis genes. We then studied the evolution of the T-boxes. Analysis of the constructed phylogenetic trees demonstrated that in addition to the normal evolution consistent with the evolution of regulated genes, T-boxes may be duplicated, transferred to other genes, and change specificity. We observed several cases of recent T-box regulon expansion following the loss of a previously existing regulatory system, in particular, arginine regulon in Clostridium difficile and methionine regulon in Lactobacillaceae. Finally, we described a new structural class of T-boxes containing duplicated terminator-antiterminator elements and unusual reduced T-boxes regulating initiation of translation in the Actinobacteria.


Subject(s)
Bacteria/genetics , Bacteria/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , T-Box Domain Proteins/genetics , T-Box Domain Proteins/metabolism , 5' Untranslated Regions , Amino Acid Transport Systems/genetics , Amino Acid Transport Systems/metabolism , Amino Acids/metabolism , Base Sequence , DNA, Bacterial/genetics , Evolution, Molecular , Gene Expression Regulation, Bacterial , Genome, Bacterial , Genomics , Models, Biological , Models, Molecular , Molecular Sequence Data , Nucleic Acid Conformation , Phylogeny , RNA, Bacterial/chemistry , RNA, Bacterial/genetics , RNA, Messenger/chemistry , RNA, Messenger/genetics , Regulon , Sequence Homology, Nucleic Acid
6.
J Bioinform Comput Biol ; 4(2): 589-96, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16819804

ABSTRACT

The RNAKinetics server (http://www.ig-msk.ru/RNA/kinetics) is a web interface for the newly developed RNAKinetics software. The software models the dynamics of RNA secondary structure by the means of kinetic analysis of folding transitions of a growing RNA molecule. The result of the modeling is a kinetic ensemble, i.e. a collection of RNA structures that are endowed with probabilities, which depend on time. This approach gives comprehensive probabilistic description of RNA folding pathways, revealing important kinetic details that are not captured by the traditional structure prediction methods. The access to the RNAKinetics server is free.


Subject(s)
Models, Chemical , Models, Molecular , RNA/chemistry , Sequence Analysis, RNA/methods , Software , User-Computer Interface , Base Sequence , Computer Graphics , Computer Simulation , Kinetics , Molecular Sequence Data , Motion , Nucleic Acid Conformation
7.
Nat Biotechnol ; 23(1): 137-44, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15637633

ABSTRACT

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Subject(s)
Computational Biology/methods , Gene Expression , Transcription, Genetic , Amino Acid Motifs , Animals , Binding Sites , Databases, Protein , Drosophila , Fungal Proteins/chemistry , Humans , Internet , Mice , Reproducibility of Results , Software
8.
J Bacteriol ; 186(19): 6575-85, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15375139

ABSTRACT

We describe a simple theoretical framework for identifying orthologous sets of genes that deviate from a clock-like model of evolution. The approach used is based on comparing the evolutionary distances within a set of orthologs to a standard intergenomic distance, which was defined as the median of the distribution of the distances between all one-to-one orthologs. Under the clock-like model, the points on a plot of intergenic distances versus intergenomic distances are expected to fit a straight line. A statistical technique to identify significant deviations from the clock-like behavior is described. For several hundred analyzed orthologous sets representing three well-defined bacterial lineages, the alpha-Proteobacteria, the gamma-Proteobacteria, and the Bacillus-Clostridium group, the clock-like null hypothesis could not be rejected for approximately 70% of the sets, whereas the rest showed substantial anomalies. Subsequent detailed phylogenetic analysis of the genes with the strongest deviations indicated that over one-half of these genes probably underwent a distinct form of horizontal gene transfer, xenologous gene displacement, in which a gene is displaced by an ortholog from a different lineage. The remaining deviations from the clock-like model could be explained by lineage-specific acceleration of evolution. The results indicate that although xenologous gene displacement is a major force in bacterial evolution, a significant majority of orthologous gene sets in three major bacterial lineages evolved in accordance with the clock-like model. The approach described here allows rapid detection of deviations from this mode of evolution on the genome scale.


Subject(s)
Evolution, Molecular , Gene Transfer, Horizontal , Genome, Bacterial , Models, Genetic , Phylogeny
9.
Hum Mol Genet ; 12(11): 1313-20, 2003 Jun 01.
Article in English | MEDLINE | ID: mdl-12761046

ABSTRACT

Alternative splicing has recently emerged as a major mechanism of generating protein diversity in higher eukaryotes. We compared alternative splicing isoforms of 166 pairs of orthologous human and mouse genes. As the mRNA and EST libraries of human and mouse are not complete and thus cannot be compared directly, we instead analyzed whether known cassette exons or alternative splicing sites from one genome are conserved in the other genome. We demonstrate that about half of the analyzed genes have species-specific isoforms, and about a quarter of elementary alternatives are not conserved between the human and mouse genomes. The detailed results of this study are available at www.ig-msk.ru:8005/HMG_paper.


Subject(s)
Alternative Splicing , Conserved Sequence , Genome, Human , Animals , Base Sequence , DNA-Binding Proteins/genetics , Exons , Expressed Sequence Tags , Humans , Membrane Proteins/genetics , Mice , Nerve Tissue Proteins/genetics , Proto-Oncogene Proteins/genetics , RNA Splicing Factors , RNA, Messenger/genetics , RNA-Binding Proteins/genetics , Sodium-Potassium-Exchanging ATPase/genetics , Transcription Factors/genetics , AIRE Protein
10.
Genome Res ; 12(10): 1507-16, 2002 Oct.
Article in English | MEDLINE | ID: mdl-12368242

ABSTRACT

Biotin is a necessary cofactor of numerous biotin-dependent carboxylases in a variety of microorganisms. The strict control of biotin biosynthesis in Escherichia coli is mediated by the bifunctional BirA protein, which acts both as a biotin-protein ligase and as a transcriptional repressor of the biotin operon. Little is known about regulation of biotin biosynthesis in other bacteria. Using comparative genomics and phylogenetic analysis, we describe the biotin biosynthetic pathway and the BirA regulon in most available bacterial genomes. Existence of an N-terminal DNA-binding domain in BirA strictly correlates with the presence of putative BirA-binding sites upstream of biotin operons. The predicted BirA-binding sites are well conserved among various eubacterial and archaeal genomes. The possible role of the hypothetical genes bioY and yhfS-yhfT, newly identified members of the BirA regulon, in the biotin metabolism is discussed. Based on analysis of co-occurrence of the biotin biosynthetic genes and bioY in complete genomes, we predict involvement of the transmembrane protein BioY in biotin transport. Various nonorthologous substitutes of the bioC-coupled gene bioH from E. coli, observed in several genomes, possibly represent the existence of different pathways for pimeloyl-CoA biosynthesis. Another interesting result of analysis of operon structures and BirA sites is that some biotin-dependent carboxylases from Rhodobacter capsulatus, actinomycetes, and archaea are possibly coregulated with BirA. BirA is the first example of a transcriptional regulator with a conserved binding signal in eubacteria and archaea.


Subject(s)
Archaeal Proteins/genetics , Biotin/genetics , Carbon-Nitrogen Ligases/genetics , Conserved Sequence/physiology , Escherichia coli Proteins/genetics , Regulon/genetics , Repressor Proteins/genetics , Signal Transduction/genetics , Transcription Factors/genetics , Chromosome Mapping/methods , Chromosome Mapping/statistics & numerical data , Computational Biology/methods , Computational Biology/statistics & numerical data , Conserved Sequence/genetics , Gene Order/genetics , Genes, Archaeal/genetics , Genes, Bacterial/genetics , Likelihood Functions
SELECTION OF CITATIONS
SEARCH DETAIL
...