Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Genome Biol ; 25(1): 83, 2024 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-38566111

RESUMEN

BACKGROUND: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.


Asunto(s)
ADN , Secuencias Reguladoras de Ácidos Nucleicos , Sitios de Unión , Alineación de Secuencia , Algoritmos , Secuencia Conservada/genética , Evolución Molecular
2.
Nat Biotechnol ; 41(12): 1734-1745, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37069313

RESUMEN

While genetically encoded reporters are common for fluorescence microscopy, equivalent multiplexable gene reporters for electron microscopy (EM) are still scarce. Here, by installing a variable number of fixation-stable metal-interacting moieties in the lumen of encapsulin nanocompartments of different sizes, we developed a suite of spherically symmetric and concentric barcodes (EMcapsulins) that are readable by standard EM techniques. Six classes of EMcapsulins could be automatically segmented and differentiated. The coding capacity was further increased by arranging several EMcapsulins into distinct patterns via a set of rigid spacers of variable length. Fluorescent EMcapsulins were expressed to monitor subcellular structures in light and EM. Neuronal expression in Drosophila and mouse brains enabled the automatic identification of genetically defined cells in EM. EMcapsulins are compatible with transmission EM, scanning EM and focused ion beam scanning EM. The expandable palette of genetically controlled EM-readable barcodes can augment anatomical EM images with multiplexed gene expression maps.


Asunto(s)
Drosophila , Microscopía Electrónica de Volumen , Animales , Ratones , Microscopía Electrónica de Rastreo , Drosophila/genética , Neuronas , Microscopía Fluorescente/métodos
3.
J Comput Biol ; 29(1): 74-89, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34986031

RESUMEN

Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Filogenia , Secuencia de Aminoácidos , Clasificación/métodos , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Evolución Molecular
4.
Proc Natl Acad Sci U S A ; 107(47): 20223-7, 2010 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-21059938

RESUMEN

Although reliable figures are often missing, considerable detrimental changes due to shrinking glaciers are universally expected for water availability in river systems under the influence of ongoing global climate change. We estimate the contribution potential of seasonally delayed glacier melt water to total water availability in large river systems. We find that the seasonally delayed glacier contribution is largest where rivers enter seasonally arid regions and negligible in the lowlands of river basins governed by monsoon climates. By comparing monthly glacier melt contributions with population densities in different altitude bands within each river basin, we demonstrate that strong human dependence on glacier melt is not collocated with highest population densities in most basins.


Asunto(s)
Cambio Climático , Cubierta de Hielo , Modelos Teóricos , Ríos , Abastecimiento de Agua , Estaciones del Año
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA