Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Genome Biol ; 25(1): 187, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38987807

ABSTRACT

Characterizing the binding preferences of transcription factors (TFs) in different cell types and conditions is key to understand how they orchestrate gene expression. Here, we develop TFscope, a machine learning approach that identifies sequence features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two conditions or two TFs with similar motifs (paralogous TFs). TFscope systematically investigates differences in the core motif, nucleotide environment and co-factor motifs, and provides the contribution of each key feature in the two experiments. TFscope was applied to > 305 ChIP-seq pairs, and several examples are discussed.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Machine Learning , Transcription Factors , Transcription Factors/metabolism , Binding Sites , Humans , Nucleotide Motifs , Protein Binding
3.
Nat Commun ; 12(1): 3297, 2021 06 02.
Article in English | MEDLINE | ID: mdl-34078885

ABSTRACT

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


Subject(s)
Microsatellite Repeats , Neural Networks, Computer , Neurodegenerative Diseases/genetics , Transcription Initiation Site , Transcription Initiation, Genetic , A549 Cells , Animals , Base Sequence , Computational Biology/methods , Deep Learning , Enhancer Elements, Genetic , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Mice , Neurodegenerative Diseases/diagnosis , Neurodegenerative Diseases/metabolism , Polymorphism, Genetic , Promoter Regions, Genetic
4.
PLoS Comput Biol ; 17(4): e1008909, 2021 04.
Article in English | MEDLINE | ID: mdl-33861755

ABSTRACT

Long regulatory elements (LREs), such as CpG islands, polydA:dT tracts or AU-rich elements, are thought to play key roles in gene regulation but, as opposed to conventional binding sites of transcription factors, few methods have been proposed to formally and automatically characterize them. We present here a computational approach named DExTER (Domain Exploration To Explain gene Regulation) dedicated to the identification of candidate LREs (cLREs) and apply it to the analysis of the genomes of P. falciparum and other eukaryotes. Our analyses show that all tested genomes contain several cLREs that are somewhat conserved along evolution, and that gene expression can be predicted with surprising accuracy on the basis of these long regions only. Regulation by cLREs exhibits very different behaviours depending on species and conditions. In P. falciparum and other Apicomplexan organisms as well as in Dictyostelium discoideum, the process appears highly dynamic, with different cLREs involved at different phases of the life cycle. For multicellular organisms, the same cLREs are involved in all tissues, but a dynamic behavior is observed along embryonic development stages. In P. falciparum, whose genome is known to be strongly depleted of transcription factors, cLREs are predictive of expression with an accuracy above 70%, and our analyses show that they are associated with both transcriptional and post-transcriptional regulation signals. Moreover, we assessed the biological relevance of one LRE discovered by DExTER in P. falciparum using an in vivo reporter assay. The source code (python) of DExTER is available at https://gite.lirmm.fr/menichelli/DExTER.


Subject(s)
Genome, Protozoan , Plasmodium falciparum/genetics , Regulatory Sequences, Nucleic Acid , Eukaryota/genetics , Gene Expression Regulation , Gene Ontology , Genes, Reporter , Histones/metabolism , RNA Processing, Post-Transcriptional , RNA, Antisense/genetics , RNA, Messenger/genetics , Transcription, Genetic
5.
PLoS Comput Biol ; 14(1): e1005889, 2018 01.
Article in English | MEDLINE | ID: mdl-29293498

ABSTRACT

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence.


Subject(s)
Proteins/chemistry , Proteins/genetics , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Computational Biology , Databases, Protein , Plasmodium falciparum/chemistry , Plasmodium falciparum/genetics , Protein Domains , Protozoan Proteins/chemistry , Protozoan Proteins/genetics , Sequence Alignment/statistics & numerical data , Sequence Analysis, Protein/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...