Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Life (Basel) ; 12(3)2022 Mar 03.
Article in English | MEDLINE | ID: mdl-35330117

ABSTRACT

The human gut microbiome is associated with various diseases, including autism spectrum disorders (ASD). Variations of the taxonomical composition in the gut microbiome of children with ASD have been observed repeatedly. However, features and parameters of the microbiome CRISPR-Cas systems in ASD have not been investigated yet. Here, we demonstrate such an analysis in order to describe the overall changes in the microbiome CRISPR-Cas systems during ASD as well as to reveal their potential to be used in diagnostics and therapy. For the systems identification, we used a combination of the publicly available tools suited for completed genomes with subsequent filtrations. In the considered data, the microbiomes of children with ASD contained fewer arrays per Gb of assembly than the control group, but the arrays included more spacers on average. CRISPR arrays from the microbiomes of children with ASD differed from the control group neither in the fractions of spacers with protospacers from known genomes, nor in the sets of known bacteriophages providing protospacers. Almost all bacterial protospacers of the gut microbiome systems for both children with ASD and the healthy ones were located in prophage islands, leaving no room for the systems to participate in the interspecies competition.

2.
PeerJ ; 6: e4545, 2018.
Article in English | MEDLINE | ID: mdl-29607260

ABSTRACT

Genome rearrangements have played an important role in the evolution of Yersinia pestis from its progenitor Yersinia pseudotuberculosis. Traditional phylogenetic trees for Y. pestis based on sequence comparison have short internal branches and low bootstrap supports as only a small number of nucleotide substitutions have occurred. On the other hand, even a small number of genome rearrangements may resolve topological ambiguities in a phylogenetic tree. We reconstructed phylogenetic trees based on genome rearrangements using several popular approaches such as Maximum likelihood for Gene Order and the Bayesian model of genome rearrangements by inversions. We also reconciled phylogenetic trees for each of the three CRISPR loci to obtain an integrated scenario of the CRISPR cassette evolution. Analysis of contradictions between the obtained evolutionary trees yielded numerous parallel inversions and gain/loss events. Our data indicate that an integrated analysis of sequence-based and inversion-based trees enhances the resolution of phylogenetic reconstruction. In contrast, reconstructions of strain relationships based on solely CRISPR loci may not be reliable, as the history is obscured by large deletions, obliterating the order of spacer gains. Similarly, numerous parallel gene losses preclude reconstruction of phylogeny based on gene content.

3.
Mol Ecol ; 26(7): 2019-2026, 2017 Apr.
Article in English | MEDLINE | ID: mdl-27997045

ABSTRACT

CRISPR-Cas are nucleic acid-based prokaryotic immune systems. CRISPR arrays accumulate spacers from foreign DNA and provide resistance to mobile genetic elements containing identical or similar sequences. Thus, the set of spacers present in a given bacterium can be regarded as a record of encounters of its ancestors with genetic invaders. Such records should be specific for different lineages and change with time, as earlier acquired spacers get obsolete and are lost. Here, we studied type I-E CRISPR spacers of Escherichia coli from extinct pachyderm. We find that many spacers recovered from intestines of a 42 000-year-old mammoth match spacers of present-day E. coli. Present-day CRISPR arrays can be reconstructed from palaeo sequences, indicating that the order of spacers has also been preserved. The results suggest that E. coli CRISPR arrays were not subject to intensive change through adaptive acquisition during this time.


Subject(s)
Biological Evolution , Clustered Regularly Interspaced Short Palindromic Repeats , Escherichia coli/genetics , Animals , DNA, Ancient , DNA, Bacterial/genetics , Intestines/microbiology , Mammoths/microbiology , Sequence Analysis, DNA
4.
Environ Microbiol ; 17(7): 2203-8, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25919787

ABSTRACT

Assessment of phylogenetic positions of predicted gene and protein sequences is a routine step in any genome project, useful for validating the species' taxonomic position and for evaluating hypotheses about genome evolution and function. Several recent eukaryotic genome projects have reported multiple gene sequences that were much more similar to homologues in bacteria than to any eukaryotic sequence. In the spirit of the times, horizontal gene transfer from bacteria to eukaryotes has been invoked in some of these cases. Here, we show, using comparative sequence analysis, that some of those bacteria-like genes indeed appear likely to have been horizontally transferred from bacteria to eukaryotes. In other cases, however, the evidence strongly indicates that the eukaryotic DNA sequenced in the genome project contains a sample of non-integrated DNA from the actual bacteria, possibly providing a window into the host microbiome. Recent literature suggests also that common reagents, kits and laboratory equipment may be systematically contaminated with bacterial DNA, which appears to be sampled by metagenome projects non-specifically. We review several bioinformatic criteria that help to distinguish putative horizontal gene transfers from the admixture of genes from autonomously replicating bacteria in their hosts' genome databases or from the reagent contamination.


Subject(s)
Bacteria/genetics , DNA, Bacterial/genetics , Eukaryota/genetics , Gene Transfer, Horizontal , Genes, Bacterial , Base Sequence , Biological Evolution , Computational Biology , Genome, Bacterial/genetics , Phylogeny
5.
BMC Genomics ; 15: 202, 2014 Mar 17.
Article in English | MEDLINE | ID: mdl-24628983

ABSTRACT

BACKGROUND: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a prokaryotic adaptive defence system that provides resistance against alien replicons such as viruses and plasmids. Spacers in a CRISPR cassette confer immunity against viruses and plasmids containing regions complementary to the spacers and hence they retain a footprint of interactions between prokaryotes and their viruses in individual strains and ecosystems. The human gut is a rich habitat populated by numerous microorganisms, but a large fraction of these are unculturable and little is known about them in general and their CRISPR systems in particular. RESULTS: We used human gut metagenomic data from three open projects in order to characterize the composition and dynamics of CRISPR cassettes in the human-associated microbiota. Applying available CRISPR-identification algorithms and a previously designed filtering procedure to the assembled human gut metagenomic contigs, we found 388 CRISPR cassettes, 373 of which had repeats not observed previously in complete genomes or other datasets. Only 171 of 3,545 identified spacers were coupled with protospacers from the human gut metagenomic contigs. The number of matches to GenBank sequences was negligible, providing protospacers for 26 spacers.Reconstruction of CRISPR cassettes allowed us to track the dynamics of spacer content. In agreement with other published observations we show that spacers shared by different cassettes (and hence likely older ones) tend to the trailer ends, whereas spacers with matches in the metagenomes are distributed unevenly across cassettes, demonstrating a preference to form clusters closer to the active end of a CRISPR cassette, adjacent to the leader, and hence suggesting dynamical interactions between prokaryotes and viruses in the human gut. Remarkably, spacers match protospacers in the metagenome of the same individual with frequency comparable to a random control, but may match protospacers from metagenomes of other individuals. CONCLUSIONS: The analysis of assembled contigs is complementary to the approach based on the analysis of original reads and hence provides additional data about composition and evolution of CRISPR cassettes, revealing the dynamics of CRISPR-phage interactions in metagenomes.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , Gastrointestinal Tract/microbiology , Metagenome , Metagenomics , Microbiota , Amino Acid Sequence , Bacteriophages/genetics , Computational Biology/methods , Humans , Molecular Sequence Data , Sequence Alignment , Viral Proteins/chemistry , Viral Proteins/genetics
6.
Appl Environ Microbiol ; 79(22): 6868-73, 2013 Nov.
Article in English | MEDLINE | ID: mdl-23995941

ABSTRACT

Analysis of the genome sequence of the starlet sea anemone, Nematostella vectensis, reveals many genes whose products are phylogenetically closer to proteins encoded by bacteria or bacteriophages than to any metazoan homologs. One explanation for such sequence affinities could be that these genes have been horizontally transferred from bacteria to the Nematostella lineage. We show, however, that bacterium-like and phage-like genes sequenced by the N. vectensis genome project tend to cluster on separate scaffolds, which typically do not include eukaryotic genes and differ from the latter in their GC contents. Moreover, most of the bacterium-like genes in N. vectensis either lack introns or the introns annotated in such genes are false predictions that, when translated, often restore the missing portions of their predicted protein products. In a freshwater cnidarian, Hydra, for which a proteobacterial endosymbiont is known, these gene features have been used to delineate the DNA of that endosymbiont sampled by the genome sequencing project. We predict that a large fraction of bacterium-like genes identified in the N. vectensis genome similarly are drawn from the contemporary bacterial consorts of the starlet sea anemone. These uncharacterized bacteria associated with N. vectensis are a proteobacterium and a representative of the phylum Bacteroidetes, each represented in the database by an apparently random sample of informational and operational genes. A substantial portion of a putative bacteriophage genome was also detected, which would be especially unlikely to have been transferred to a eukaryote.


Subject(s)
Bacteria/growth & development , Genes, Bacterial , Sea Anemones/genetics , Sea Anemones/microbiology , Animals , Bacteria/genetics , Computational Biology , Genome , Introns , Phylogeny , Sequence Analysis, DNA , Symbiosis/genetics
7.
Nat Commun ; 4: 1387, 2013.
Article in English | MEDLINE | ID: mdl-23340427

ABSTRACT

The emergence of ribosomes and translation factors is central for understanding the origin of life. Recruitment of translation factors to bacterial ribosomes is mediated by the L12 stalk composed of protein L10 and several copies of protein L12, the only multi-copy protein of the ribosome. Here we predict stoichiometries of L12 stalk for >1,200 bacteria, mitochondria and chloroplasts by a computational analysis, and validate the predictions by quantitative mass spectrometry. The majority of bacteria have L12 stalks allowing for binding of four or six copies of L12, largely independent of the taxonomic group or living conditions of the bacteria, whereas some cyanobacteria have eight copies. Mitochondrial and chloroplast ribosomes can accommodate six copies of L12. The last universal common ancestor probably had six molecules of L12 molecules bound to L10. Changes of the stalk composition provide a unique possibility to trace the evolution of protein components of the ribosome.


Subject(s)
Bacteria/metabolism , Bacterial Proteins/genetics , Evolution, Molecular , Ribosomal Proteins/genetics , Ribosomes/metabolism , Amino Acid Sequence , Bacteria/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Chloroplasts/metabolism , Gene Dosage , Humans , Mass Spectrometry , Mitochondria/metabolism , Mitochondrial Proteins/chemistry , Mitochondrial Proteins/metabolism , Molecular Sequence Data , Phylogeny , Protein Binding , Protein Multimerization , Protein Structure, Secondary , Protein Structure, Tertiary , RNA, Ribosomal, 16S/genetics , Ribosomal Protein L10 , Ribosomal Proteins/chemistry , Ribosomal Proteins/metabolism , Synechococcus/metabolism , Thermotoga maritima/genetics , Thermotoga maritima/metabolism
8.
Proteins ; 80(5): 1363-76, 2012 May.
Article in English | MEDLINE | ID: mdl-22275035

ABSTRACT

Eukaryotic-like serine/threonine protein kinases (ESTPKs) are widely spread throughout the bacterial genomes. These enzymes can be potential targets of new antibacterial drugs useful for the treatment of socially important diseases such as tuberculosis. In this study, ESTPKs of pathogenic, probiotic, and antibiotic-producing Gram-positive bacteria were classified according to the physicochemical properties of amino acid residues in the ATP-binding site of the enzyme. Nine residues were identified that line the surface of the adenine-binding pocket, and ESTPKs were classified based on these signatures. Twenty groups were discovered, five of them containing >10 representatives. The two most abundant groups contained >150 protein kinases that belong to the various branches of the phylogenetic tree, whereas certain groups are genus- or even species-specific. Homology modeling of the typical representatives of each group revealed that the classification is reliable, and the differences between the protein kinase ATP-binding pockets predicted based on their signatures are apparent in their structure. The classification is expected to be useful for the selection of targets for new anti-infective drugs.


Subject(s)
Adenosine Triphosphate/metabolism , Bacterial Proteins/chemistry , Gram-Positive Bacteria/classification , Gram-Positive Bacteria/enzymology , Protein Serine-Threonine Kinases/chemistry , Adenosine Triphosphate/chemistry , Amino Acid Sequence , Bacterial Proteins/metabolism , Binding Sites , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Serine-Threonine Kinases/metabolism , Sequence Alignment , Sequence Homology, Amino Acid
9.
Biol Direct ; 5: 54, 2010 Sep 08.
Article in English | MEDLINE | ID: mdl-20825637

ABSTRACT

BACKGROUND: Gene duplications are a source of new genes and protein functions. The innovative role of duplication events makes families of paralogous genes an interesting target for studies in evolutionary biology. Here we study global trends in the evolution of human genes that resulted from recent duplications. RESULTS: The pressure of negative selection is weaker during a short time immediately after a duplication event. Roughly one fifth of genes in paralogous gene families are evolving asymmetrically: one of the proteins encoded by two closest paralogs accumulates amino acid substitutions significantly faster than its partner. This asymmetry cannot be explained by differences in gene expression levels. In asymmetric gene pairs the number of deleterious mutations is increased in one copy, while decreased in the other copy as compared to genes constituting non-asymmetrically evolving pairs. The asymmetry in the rate of synonymous substitutions is much weaker and not significant. CONCLUSIONS: The increase of negative selection pressure over time after a duplication event seems to be a major trend in the evolution of human paralogous gene families. The observed asymmetry in the evolution of paralogous genes shows that in many cases one of two gene copies remains practically unchanged, while the other accumulates functional mutations. This supports the hypothesis that slowly evolving gene copies preserve their original functions, while fast evolving copies obtain new specificities or functions.


Subject(s)
Evolution, Molecular , Genes, Duplicate/genetics , Humans , Selection, Genetic/genetics
10.
Appl Environ Microbiol ; 76(7): 2136-44, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20118362

ABSTRACT

Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation.


Subject(s)
Bacteria/genetics , Bacteria/virology , DNA, Bacterial/genetics , Evolution, Molecular , Inverted Repeat Sequences , Metagenome , Seawater/microbiology , Cluster Analysis , Oceans and Seas
11.
FEMS Microbiol Lett ; 296(1): 110-6, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19459963

ABSTRACT

Clustered regularly interspaced short palindromic repeat (CRISPR) is a bacterial immunity system that requires a perfect sequence match between the CRISPR cassette spacer and a protospacer in invading DNA for exclusion of foreign genetic elements. CRISPR cassettes are hypervariable, possibly reflecting different exposure of strains of the same species to foreign genetic elements. Here, we determined CRISPR cassette sequences of two Xanthomonas oryzae strains and found that one of the strains remains sensitive to phage Xop411 despite carrying a cassette that has a spacer exactly matching a fragment of the Xop411 genome. To explain this apparent paradox, we identified X. oryzae CRISPR spacers of likely phage origin and defined a consensus sequence of a motif adjacent to X. oryzae phage protospacers. Our analysis revealed that the Xop411 protospacer that matches the CRISPR spacer has this motif mutated, which likely explains the phage's ability to infect its host. While similar observations were made previously with Streptococcus thermophilus and its phages, the conserved motif in X. oryzae phages is located on a protospacer side opposite to the S. thermophilus phages' motif. The results thus point to a considerable degree of variety of CRISPR-mediated phage resistance mechanisms in different bacteria.


Subject(s)
DNA, Bacterial/genetics , Repetitive Sequences, Nucleic Acid , Xanthomonas/genetics , Bacteriophages/genetics , Bacteriophages/growth & development , DNA, Bacterial/chemistry , Gene Order , Genetic Variation , Models, Biological , Molecular Sequence Data , Mutation , Sequence Analysis, DNA , Xanthomonas/virology
12.
BMC Bioinformatics ; 8: 261, 2007 Jul 21.
Article in English | MEDLINE | ID: mdl-17659089

ABSTRACT

BACKGROUND: Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. RESULTS: Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. CONCLUSION: Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.


Subject(s)
Algorithms , Databases, Genetic/statistics & numerical data , Databases, Protein/statistics & numerical data , Genome , Information Storage and Retrieval/methods , Proteins/genetics , Amino Acid Sequence , Computational Biology/methods , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Analysis, Protein , Software
14.
Nucleic Acids Res ; 35(Database issue): D354-7, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17148486

ABSTRACT

The PEDANT genome database provides exhaustive annotation of 468 genomes by a broad set of bioinformatics algorithms. We describe recent developments of the PEDANT Web server. The all-new Graphical User Interface (GUI) implemented in Javatrade mark allows for more efficient navigation of the genome data, extended search capabilities, user customization and export facilities. The DNA and Protein viewers have been made highly dynamic and customizable. We also provide Web Services to access the entire body of PEDANT data programmatically. Finally, we report on the application of association rule mining for automatic detection of potential annotation errors. PEDANT is freely accessible to academic users at http://pedant.gsf.de.


Subject(s)
Databases, Genetic , Genomics , Sequence Analysis, Protein , Computer Graphics , Databases, Genetic/standards , Internet , Proteins/genetics , User-Computer Interface
15.
Bioinformatics ; 21 Suppl 3: iii49-57, 2005 Nov 01.
Article in English | MEDLINE | ID: mdl-16306393

ABSTRACT

MOTIVATION: Millions of protein sequences currently being deposited to sequence databanks will never be annotated manually. Similarity-based annotation generated by automatic software pipelines unavoidably contains spurious assignments due to the imperfection of bioinformatics methods. Examples of such annotation errors include over- and underpredictions caused by the use of fixed recognition thresholds and incorrect annotations caused by transitivity based information transfer to unrelated proteins or transfer of errors already accumulated in databases. One of the most difficult and timely challenges in bioinformatics is the development of intelligent systems aimed at improving the quality of automatically generated annotation. A possible approach to this problem is to detect anomalies in annotation items based on association rule mining. RESULTS: We present the first large-scale analysis of association rules derived from two large protein annotation databases-Swiss-Prot and PEDANT-and reveal novel, previously unknown tendencies of rule strength distributions. Most of the rules are either very strong or very weak, with rules in the medium strength range being relatively infrequent. Based on dynamics of error correction in subsequent Swiss-Prot releases and on our own manual analysis we demonstrate that exceptions from strong rules are, indeed, significantly enriched in annotation errors and can be used to automatically flag them. We identify different strength dependencies of rules derived from different fields in Swiss-Prot. A compositional breakdown of association rules generated from PEDANT in terms of their constituent items indicates that most of the errors that can be corrected are related to gene functional roles. Swiss-Prot errors are usually caused by under-annotation owing to its conservative approach, whereas automatically generated PEDANT annotation suffers from over-annotation. AVAILABILITY: All data generated in this study are available for download and browsing at http://pedant.gsf.de/ARIA/index.htm.


Subject(s)
Databases, Protein , Information Storage and Retrieval/methods , Proteins/chemistry , Proteins/classification , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Conserved Sequence , Sequence Homology, Nucleic Acid , Statistics as Topic
16.
J Mol Evol ; 59(5): 620-31, 2004 Nov.
Article in English | MEDLINE | ID: mdl-15693618

ABSTRACT

Cancer/testis antigens (CT-antigens) are proteins that are predominantly expressed in cancer and testis and thus are possible targets for immunotherapy. Most of them form large multigene families. The evolution of the MAGE-A family of CT-antigens is characterized by four processes: (1) gene duplications; (2) duplications of the initial exon; (3) point mutations and short insertions/deletions inactivating splicing sites or creating new sites; and (4) deletions removing sites and creating chimeric exons. All this concerns the genomic regions upstream of the coding region, creating a wide diversity of isoforms with different 5'-untranslated regions. Many of these isoforms are gene-specific and have emerged due to point mutations in alternative and constitutive splicing sites. There are also examples of chimeric mRNAs, likely produced by splicing of read-through transcripts. Since there is consistent use of homologous sites for different genes and no random, indiscriminant use of preexisting cryptic sites, it is likely that most observed isoforms are functional, and do not result from relaxed control in transformed cells.


Subject(s)
Alternative Splicing/genetics , Antigens/genetics , Evolution, Molecular , Exons/genetics , Introns/genetics , Neoplasm Proteins/genetics , Testis/metabolism , Animals , Antigens/chemistry , Antigens, Neoplasm , Base Sequence , Biomarkers, Tumor/genetics , Genome, Human , Humans , Internet , Male , Melanoma-Specific Antigens , Mice , Molecular Sequence Data , Multigene Family/genetics , Organ Specificity , Phylogeny , Protein Isoforms/genetics , Recombinant Fusion Proteins/genetics , Sequence Alignment
17.
Hum Mol Genet ; 12(11): 1313-20, 2003 Jun 01.
Article in English | MEDLINE | ID: mdl-12761046

ABSTRACT

Alternative splicing has recently emerged as a major mechanism of generating protein diversity in higher eukaryotes. We compared alternative splicing isoforms of 166 pairs of orthologous human and mouse genes. As the mRNA and EST libraries of human and mouse are not complete and thus cannot be compared directly, we instead analyzed whether known cassette exons or alternative splicing sites from one genome are conserved in the other genome. We demonstrate that about half of the analyzed genes have species-specific isoforms, and about a quarter of elementary alternatives are not conserved between the human and mouse genomes. The detailed results of this study are available at www.ig-msk.ru:8005/HMG_paper.


Subject(s)
Alternative Splicing , Conserved Sequence , Genome, Human , Animals , Base Sequence , DNA-Binding Proteins/genetics , Exons , Expressed Sequence Tags , Humans , Membrane Proteins/genetics , Mice , Nerve Tissue Proteins/genetics , Proto-Oncogene Proteins/genetics , RNA Splicing Factors , RNA, Messenger/genetics , RNA-Binding Proteins/genetics , Sodium-Potassium-Exchanging ATPase/genetics , Transcription Factors/genetics , AIRE Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...