Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Commun Biol ; 7(1): 835, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982288

ABSTRACT

Significant progress has been made in the field of plant genomics, as demonstrated by the increased use of high-throughput methodologies that enable the characterization of multiple genome-wide molecular phenotypes. These findings have provided valuable insights into plant traits and their underlying genetic mechanisms, particularly in model plant species. Nonetheless, effectively leveraging them to make accurate predictions represents a critical step in crop genomic improvement. We present AgroNT, a foundational large language model trained on genomes from 48 plant species with a predominant focus on crop species. We show that AgroNT can obtain state-of-the-art predictions for regulatory annotations, promoter/terminator strength, tissue-specific gene expression, and prioritize functional variants. We conduct a large-scale in silico saturation mutagenesis analysis on cassava to evaluate the regulatory impact of over 10 million mutations and provide their predicted effects as a resource for variant characterization. Finally, we propose the use of the diverse datasets compiled here as the Plants Genomic Benchmark (PGB), providing a comprehensive benchmark for deep learning-based methods in plant genomic research. The pre-trained AgroNT model is publicly available on HuggingFace at https://huggingface.co/InstaDeepAI/agro-nucleotide-transformer-1b  for future research purposes.


Subject(s)
Genome, Plant , Plants, Edible/genetics , Genomics/methods , Deep Learning , Manihot/genetics
2.
Genome Biol ; 25(1): 146, 2024 06 06.
Article in English | MEDLINE | ID: mdl-38844976

ABSTRACT

BACKGROUND: DNA methylation is an important epigenetic modification which has numerous roles in modulating genome function. Its levels are spatially correlated across the genome, typically high in repressed regions but low in transcription factor (TF) binding sites and active regulatory regions. However, the mechanisms establishing genome-wide and TF binding site methylation patterns are still unclear. RESULTS: Here we use a comparative approach to investigate the association of DNA methylation to TF binding evolution in mammals. Specifically, we experimentally profile DNA methylation and combine this with published occupancy profiles of five distinct TFs (CTCF, CEBPA, HNF4A, ONECUT1, FOXA1) in the liver of five mammalian species (human, macaque, mouse, rat, dog). TF binding sites are lowly methylated, but they often also have intermediate methylation levels. Furthermore, biding sites are influenced by the methylation status of CpGs in their wider binding regions even when CpGs are absent from the core binding motif. Employing a classification and clustering approach, we extract distinct and species-conserved patterns of DNA methylation levels at TF binding regions. CEBPA, HNF4A, ONECUT1, and FOXA1 share the same methylation patterns, while CTCF's differ. These patterns characterize alternative functions and chromatin landscapes of TF-bound regions. Leveraging our phylogenetic framework, we find DNA methylation gain upon evolutionary loss of TF occupancy, indicating coordinated evolution. Furthermore, each methylation pattern has its own evolutionary trajectory reflecting its genomic contexts. CONCLUSIONS: Our epigenomic analyses indicate a role for DNA methylation in TF binding changes across species including that specific DNA methylation profiles characterize TF binding and are associated with their regulatory activity, chromatin contexts, and evolutionary trajectories.


Subject(s)
DNA Methylation , Evolution, Molecular , Transcription Factors , Animals , Binding Sites , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Mice , Rats , CpG Islands , Dogs , Hepatocyte Nuclear Factor 3-alpha/metabolism , Hepatocyte Nuclear Factor 3-alpha/genetics , Protein Binding , Liver/metabolism , Hepatocyte Nuclear Factor 4/metabolism , Hepatocyte Nuclear Factor 4/genetics , CCAAT-Enhancer-Binding Proteins/metabolism , CCAAT-Enhancer-Binding Proteins/genetics
3.
Genome Biol ; 22(1): 62, 2021 02 18.
Article in English | MEDLINE | ID: mdl-33602314

ABSTRACT

BACKGROUND: To investigate the mechanisms driving regulatory evolution across tissues, we experimentally mapped promoters, enhancers, and gene expression in the liver, brain, muscle, and testis from ten diverse mammals. RESULTS: The regulatory landscape around genes included both tissue-shared and tissue-specific regulatory regions, where tissue-specific promoters and enhancers evolved most rapidly. Genomic regions switching between promoters and enhancers were more common across species, and less common across tissues within a single species. Long Interspersed Nuclear Elements (LINEs) played recurrent evolutionary roles: LINE L1s were associated with tissue-specific regulatory regions, whereas more ancient LINE L2s were associated with tissue-shared regulatory regions and with those switching between promoter and enhancer signatures across species. CONCLUSIONS: Our analyses of the tissue-specificity and evolutionary stability among promoters and enhancers reveal how specific LINE families have helped shape the dynamic mammalian regulome.


Subject(s)
Evolution, Molecular , Gene Expression Regulation , Long Interspersed Nucleotide Elements , Mammals/genetics , Regulatory Sequences, Nucleic Acid , Retroelements , Animals , Chromosome Mapping , Conserved Sequence , Enhancer Elements, Genetic , Humans , Organ Specificity/genetics , Promoter Regions, Genetic
4.
Nat Commun ; 11(1): 3676, 2020 07 27.
Article in English | MEDLINE | ID: mdl-32719321

ABSTRACT

The genomes of non-bilaterian metazoans are key to understanding the molecular basis of early animal evolution. However, a full comprehension of how animal-specific traits, such as nervous systems, arose is hindered by the scarcity and fragmented nature of genomes from key taxa, such as Porifera. Ephydatia muelleri is a freshwater sponge found across the northern hemisphere. Here, we present its 326 Mb genome, assembled to high contiguity (N50: 9.88 Mb) with 23 chromosomes on 24 scaffolds. Our analyses reveal a metazoan-typical genome architecture, with highly shared synteny across Metazoa, and suggest that adaptation to the extreme temperatures and conditions found in freshwater often involves gene duplication. The pancontinental distribution and ready laboratory culture of E. muelleri make this a highly practical model system which, with RNAseq, DNA methylation and bacterial amplicon data spanning its development and range, allows exploration of genomic changes both within sponges and in early animal evolution.


Subject(s)
Chromosome Mapping , Chromosomes/genetics , Evolution, Molecular , Porifera/genetics , Adaptation, Physiological/genetics , Animals , Epigenesis, Genetic , Fresh Water , Gene Expression Regulation, Developmental , Molecular Sequence Annotation , Phylogeny , Porifera/growth & development , RNA-Seq , Sequence Analysis, DNA , Synteny
5.
Genome Biol ; 21(1): 5, 2020 01 07.
Article in English | MEDLINE | ID: mdl-31910870

ABSTRACT

BACKGROUND: CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive. RESULTS: We carry out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We perform CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associate them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by a balance of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor dynamically evolving clusters containing both evolutionarily old and young CTCF sites as a result of the repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. CONCLUSIONS: Dynamic conservation of CTCF site clusters is an apparently important feature of CTCF binding evolution that is critical to the functional stability of a higher-order chromatin structure.


Subject(s)
CCCTC-Binding Factor/genetics , CCCTC-Binding Factor/metabolism , Chromatin/metabolism , Evolution, Molecular , Mice/genetics , Animals , Chromatin Immunoprecipitation Sequencing , Genome
6.
Nat Med ; 24(6): 868-880, 2018 06.
Article in English | MEDLINE | ID: mdl-29785028

ABSTRACT

Chronic lymphocytic leukemia (CLL) is a frequent hematological neoplasm in which underlying epigenetic alterations are only partially understood. Here, we analyze the reference epigenome of seven primary CLLs and the regulatory chromatin landscape of 107 primary cases in the context of normal B cell differentiation. We identify that the CLL chromatin landscape is largely influenced by distinct dynamics during normal B cell maturation. Beyond this, we define extensive catalogues of regulatory elements de novo reprogrammed in CLL as a whole and in its major clinico-biological subtypes classified by IGHV somatic hypermutation levels. We uncover that IGHV-unmutated CLLs harbor more active and open chromatin than IGHV-mutated cases. Furthermore, we show that de novo active regions in CLL are enriched for NFAT, FOX and TCF/LEF transcription factor family binding sites. Although most genetic alterations are not associated with consistent epigenetic profiles, CLLs with MYD88 mutations and trisomy 12 show distinct chromatin configurations. Furthermore, we observe that non-coding mutations in IGHV-mutated CLLs are enriched in H3K27ac-associated regulatory elements outside accessible chromatin. Overall, this study provides an integrative portrait of the CLL epigenome, identifies extensive networks of altered regulatory elements and sheds light on the relationship between the genetic and epigenetic architecture of the disease.


Subject(s)
Chromatin/metabolism , Epigenomics , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , B-Lymphocytes/metabolism , Base Sequence , Cohort Studies , Humans
7.
Genome Res ; 28(4): 448-459, 2018 04.
Article in English | MEDLINE | ID: mdl-29563166

ABSTRACT

Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.


Subject(s)
Evolution, Molecular , Genome/genetics , Muridae/genetics , Phylogeny , Animals , Binding Sites , CCCTC-Binding Factor/genetics , Chromosomes/genetics , Karyotyping/methods , Long Interspersed Nucleotide Elements/genetics , Mice , Retroelements/genetics , Species Specificity
8.
Lab Invest ; 98(5): 554-570, 2018 05.
Article in English | MEDLINE | ID: mdl-29453400

ABSTRACT

Metastasis suppressors are genes/proteins involved in regulation of one or more steps of the metastatic cascade while having little or no effect on tumor growth. The list of putative metastasis suppressors is constantly increasing although thorough understanding of their biochemical mechanism(s) and evolutionary history is still lacking. Little is known about tumor-related genes in invertebrates, especially non-bilaterians and unicellular relatives of animals. However, in the last few years we have been witnessing a growing interest in this subject since it has been shown that many disease-related genes are already present in simple non-bilateral animals and even in their unicellular relatives. Studying human diseases using simpler organisms that may better represent the ancestral conditions in which the specific disease-related genes appeared could provide better understanding of how those genes function. This review represents a compilation of published literature and our bioinformatics analysis to gain a general insight into the evolutionary history of metastasis-suppressor genes in animals (Metazoa). Our survey suggests that metastasis-suppressor genes emerged in three different periods in the evolution of Metazoa: before the origin of metazoans, with the emergence of first animals and at the origin of vertebrates.


Subject(s)
Genes, Tumor Suppressor/physiology , Neoplasm Metastasis/prevention & control , Animals , Computational Biology , Evolution, Molecular , Surveys and Questionnaires , Tumor Suppressor Proteins/physiology
9.
Nucleic Acids Res ; 41(19): 8842-52, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23921637

ABSTRACT

Microbial communities represent the largest portion of the Earth's biomass. Metagenomics projects use high-throughput sequencing to survey these communities and shed light on genetic capabilities that enable microbes to inhabit every corner of the biosphere. Metagenome studies are generally based on (i) classifying and ranking functions of identified genes; and (ii) estimating the phyletic distribution of constituent microbial species. To understand microbial communities at the systems level, it is necessary to extend these studies beyond the species' boundaries and capture higher levels of metabolic complexity. We evaluated 11 metagenome samples and demonstrated that microbes inhabiting the same ecological niche share common preferences for synonymous codons, regardless of their phylogeny. By exploring concepts of translational optimization through codon usage adaptation, we demonstrated that community-wide bias in codon usage can be used as a prediction tool for lifestyle-specific genes across the entire microbial community, effectively considering microbial communities as meta-genomes. These findings set up a 'functional metagenomics' platform for the identification of genes relevant for adaptations of entire microbial communities to environments. Our results provide valuable arguments in defining the concept of microbial species through the context of their interactions within the community.


Subject(s)
Adaptation, Biological/genetics , Codon , Metagenome , Animals , Ecosystem , Genome, Bacterial , Humans , Metagenomics , Mice , Phylogeny , Proteomics
10.
PLoS One ; 7(8): e42523, 2012.
Article in English | MEDLINE | ID: mdl-22880015

ABSTRACT

Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with "higher" metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales.


Subject(s)
Introns/genetics , Porifera/genetics , Ribosomal Proteins/genetics , Animals , Base Sequence , Conserved Sequence , Humans , Molecular Sequence Data , Nucleic Acid Conformation , Nucleotide Motifs/genetics , RNA, Ribosomal, 28S/genetics , RNA, Small Nucleolar/chemistry , RNA, Small Nucleolar/genetics , Sequence Alignment , Species Specificity
11.
Genomics ; 98(1): 56-63, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21457775

ABSTRACT

Equimolecular presence of ribosomal proteins (RPs) in the cell is needed for ribosome assembly and is achieved by synchronized expression of ribosomal protein genes (RPGs) with promoters of similar strengths. Over-represented motifs of RPG promoter regions are identified as targets for specific transcription factors. Unlike RPs, those motifs are not conserved between mammals, drosophila, and yeast. We analyzed RPGs proximal promoter regions of three basal metazoans with sequenced genomes: sponge, cnidarian, and placozoan and found common features, such as 5'-terminal oligopyrimidine tracts and TATA-boxes. Furthermore, we identified over-represented motifs, some of which displayed the highest similarity to motifs abundant in human RPG promoters and not present in Drosophila or yeast. Our results indicate that humans over-represented motifs, as well as corresponding domains of transcription factors, were established very early in metazoan evolution. The fast evolving nature of RPGs regulatory network leads to formation of other, lineage specific, over-represented motifs.


Subject(s)
Ribosomal Proteins/genetics , Amino Acid Sequence , Animals , Humans , Molecular Sequence Data , Promoter Regions, Genetic , Ribosomal Proteins/chemistry , Sequence Alignment
12.
Mol Biol Evol ; 27(12): 2747-56, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20621960

ABSTRACT

Sponges (Porifera) are among the simplest living and the earliest branching metazoans. They hold a pivotal role for studying genome evolution of the entire metazoan branch, both as an outgroup to Eumetazoa and as the closest branching phylum to the common ancestor of all multicellular animals (Urmetazoa). In order to assess the transcription inventory of sponges, we sequenced expressed sequence tag libraries of two demosponge species, Suberites domuncula and Lubomirskia baicalensis, and systematically analyzed the assembled sponge transcripts against their homologs from complete proteomes of six well-characterized metazoans--Nematostella vectensis, Caenorhabditis elegans, Drosophila melanogaster, Strongylocentrotus purpuratus, Ciona intestinalis, and Homo sapiens. We show that even the earliest metazoan species already have strikingly complex genomes in terms of gene content and functional repertoire and that the rich gene repertoire existed even before the emergence of true tissues, therefore further emphasizing the importance of gene loss and spatio-temporal changes in regulation of gene expression in shaping the metazoan genomes. Our findings further indicate that sponge and human genes generally show similarity levels higher than expected from their respective positions in metazoan phylogeny, providing direct evidence for slow rate of evolution in both "basal" and "apical" metazoan genome lineages. We propose that the ancestor of all metazoans had already had an unusually complex genome, thereby shifting the origins of genome complexity from Urbilateria to Urmetazoa.


Subject(s)
Evolution, Molecular , Expressed Sequence Tags , Phylogeny , Porifera/genetics , Suberites/genetics , Animals , Base Sequence , Comparative Genomic Hybridization , Gene Expression Regulation , Genome , Molecular Sequence Data , Sequence Alignment , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...