Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31598706

ABSTRACT

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genetic Variation , Genome, Bacterial , Genome, Fungal , Genome, Plant , Algorithms , Animals , Caenorhabditis elegans/genetics , Genomics , Internet , Molecular Sequence Annotation , Phenotype , Plants/genetics , Reference Values , Software , User-Computer Interface
2.
Front Genet ; 10: 1078, 2019.
Article in English | MEDLINE | ID: mdl-31737053

ABSTRACT

Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.

3.
Mol Microbiol ; 111(6): 1558-1570, 2019 06.
Article in English | MEDLINE | ID: mdl-30875129

ABSTRACT

CRISPR interference occurs when a protospacer recognized by the CRISPR RNA is destroyed by Cas effectors. In Type I CRISPR-Cas systems, protospacer recognition can lead to «primed adaptation¼ - acquisition of new spacers from in cis located sequences. Type I CRISPR-Cas systems require the presence of a trinucleotide protospacer adjacent motif (PAM) for efficient interference. Here, we investigated the ability of each of 64 possible trinucleotides located at the PAM position to induce CRISPR interference and primed adaptation by the Escherichia coli Type I-E CRISPR-Cas system. We observed clear separation of PAM variants into three groups: those unable to cause interference, those that support rapid interference and those that lead to reduced interference that occurs over extended periods of time. PAM variants unable to support interference also did not support primed adaptation; those that supported rapid interference led to no or low levels of adaptation, while those that caused attenuated levels of interference consistently led to highest levels of adaptation. The results suggest that primed adaptation is fueled by the products of CRISPR interference. Extended over time interference with targets containing «attenuated¼ PAM variants provides a continuous source of new spacers leading to high overall level of spacer acquisition.


Subject(s)
CRISPR-Cas Systems , DNA, Intergenic , Escherichia coli/genetics
4.
Philos Trans R Soc Lond B Biol Sci ; 374(1772): 20180092, 2019 05 13.
Article in English | MEDLINE | ID: mdl-30905291

ABSTRACT

We investigated the diversity of CRISPR spacers of Thermus communities from two locations in Italy, two in Chile and one location in Russia. Among the five sampling sites, a total of more than 7200 unique spacers belonging to different CRISPR-Cas systems types and subtypes were identified. Most of these spacers are not found in CRISPR arrays of sequenced Thermus strains. Comparison of spacer sets revealed that samples within the same area (separated by few to hundreds of metres) have similar spacer sets, which appear to be largely stable at least over the course of several years. While at further distances (hundreds of kilometres and more) the similarity of spacer sets is decreased, there are still multiple common spacers in Thermus communities from different continents. The common spacers can be reconstructed in identical or similar CRISPR arrays, excluding their independent appearance and suggesting an extensive migration of thermophilic bacteria over long distances. Several new Thermus phages were isolated in the sampling sites. Mapping of spacers to bacteriophage sequences revealed examples of local acquisition of spacers from some phages and distinct patterns of targeting of phage genomes by different CRISPR-Cas systems. This article is part of a discussion meeting issue 'The ecology and evolution of prokaryotic CRISPR-Cas adaptive immune systems'.


Subject(s)
Bacteriophages/genetics , CRISPR-Cas Systems , Clustered Regularly Interspaced Short Palindromic Repeats , Thermus/genetics , Chile , Italy , Russia , Thermus/virology
5.
RNA Biol ; 16(4): 413-422, 2019 04.
Article in English | MEDLINE | ID: mdl-30022698

ABSTRACT

Target binding by CRISPR-Cas ribonucleoprotein effectors is initiated by the recognition of double-stranded PAM motifs by the Cas protein moiety followed by destabilization, localized melting, and interrogation of the target by the guide part of CRISPR RNA moiety. The latter process depends on seed sequences, parts of the target that must be strictly complementary to CRISPR RNA guide. Mismatches between the target and CRISPR RNA guide outside the seed have minor effects on target binding, thus contributing to off-target activity of CRISPR-Cas effectors. Here, we define the seed sequence of the Type V Cas12b effector from Bacillus thermoamylovorans. While the Cas12b seed is just five bases long, in contrast to all other effectors characterized to date, the nucleotide base at the site of target cleavage makes a very strong contribution to target binding. The generality of this additional requirement was confirmed during analysis of target recognition by Cas12b effector from Alicyclobacillus acidoterrestris. Thus, while the short seed may contribute to Cas12b promiscuity, the additional specificity determinant at the site of cleavage may have a compensatory effect making Cas12b suitable for specialized genome editing applications.


Subject(s)
CRISPR-Associated Proteins/metabolism , CRISPR-Cas Systems/genetics , Bacillus/genetics , Base Sequence , DNA, Bacterial/genetics , Escherichia coli , Gene Library , Nucleic Acid Conformation
6.
BMC Evol Biol ; 17(Suppl 2): 258, 2017 12 28.
Article in English | MEDLINE | ID: mdl-29297306

ABSTRACT

BACKGROUND: Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. RESULTS: In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. CONCLUSIONS: This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.


Subject(s)
Genome , Transcriptome/genetics , Whales/genetics , Animals , Gene Expression Regulation , Gene Library , Molecular Sequence Annotation , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL
...