Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
bioRxiv ; 2023 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-38014075

RESUMO

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

3.
bioRxiv ; 2023 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-37546906

RESUMO

The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.

4.
Genome Biol ; 24(1): 134, 2023 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-37280678

RESUMO

Recent deep learning models that predict the Hi-C contact map from DNA sequence achieve promising accuracy but cannot generalize to new cell types and or even capture differences among training cell types. We propose Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from widely available epigenomic tracks. Epiphany uses bidirectional long short-term memory layers to capture long-range dependencies and optionally a generative adversarial network architecture to encourage contact map realism. Epiphany shows excellent generalization to held-out chromosomes within and across cell types, yields accurate TAD and interaction calls, and predicts structural changes caused by perturbations of epigenomic signals.


Assuntos
Cromossomos , Epigenômica , Redes Neurais de Computação , Cromatina
5.
Genome Res ; 32(5): 930-944, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35396274

RESUMO

Linking distal enhancers to genes and modeling their impact on target gene expression are longstanding unresolved problems in regulatory genomics and critical for interpreting noncoding genetic variation. Here, we present a new deep learning approach called GraphReg that exploits 3D interactions from chromosome conformation capture assays to predict gene expression from 1D epigenomic data or genomic DNA sequence. By using graph attention networks to exploit the connectivity of distal elements up to 2 Mb away in the genome, GraphReg more faithfully models gene regulation and more accurately predicts gene expression levels than the state-of-the-art deep learning methods for this task. Feature attribution used with GraphReg accurately identifies functional enhancers of genes, as validated by CRISPRi-FlowFISH and TAP-seq assays, outperforming both convolutional neural networks (CNNs) and the recently proposed activity-by-contact model. Sequence-based GraphReg also accurately predicts direct transcription factor (TF) targets as validated by CRISPRi TF knockout experiments via in silico ablation of TF binding motifs. GraphReg therefore represents an important advance in modeling the regulatory impact of epigenomic and sequence elements.


Assuntos
Regulação da Expressão Gênica , Redes Neurais de Computação , Sítios de Ligação/genética , Cromatina/genética , Redes Reguladoras de Genes , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo
6.
Artigo em Inglês | MEDLINE | ID: mdl-31180899

RESUMO

There is often a limited amount of omics data to design predictive models in biomedicine. Knowing that these omics data come from underlying processes that may share common pathways and disease mechanisms, it may be beneficial for designing a more accurate and reliable predictor in a target domain of interest, where there is a lack of labeled data to leverage available data in relevant source domains. Here, we focus on developing Bayesian transfer learning methods for analyzing next-generation sequencing (NGS) data to help improve predictions in the target domain. We formulate transfer learning in a fully Bayesian framework and define the relatedness by a joint prior distribution of the model parameters of the source and target domains. Defining joint priors acts as a bridge across domains, through which the related knowledge of source data is transferred to the target domain. We focus on RNA-seq discrete count data, which are often overdispersed. To appropriately model them, we consider the Negative Binomial model and propose an Optimal Bayesian Transfer Learning (OBTL) classifier that minimizes the expected classification error in the target domain. We evaluate the performance of the OBTL classifier via both synthetic and cancer data from The Cancer Genome Atlas (TCGA).


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Aprendizado de Máquina , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Estatísticos , Neoplasias/genética , Neoplasias/metabolismo
7.
Artigo em Inglês | MEDLINE | ID: mdl-29053466

RESUMO

This paper studies classification of gene-expression trajectories coming from two classes, healthy and mutated (cancerous) using Boolean networks with perturbation (BNps) to model the dynamics of each class at the state level. Each class has its own BNp, which is partially known based on gene pathways. We employ a Gaussian model at the observation level to show the expression values of the genes given the hidden binary states at each time point. We use expectation maximization (EM) to learn the BNps and the unknown model parameters, derive closed-form updates for the parameters, and propose a learning algorithm. After learning, a plug-in Bayes classifier is used to classify unlabeled trajectories, which can have missing data. Measuring gene expressions at different times yields trajectories only when measurements come from a single cell. In multiple-cell scenarios, the expression values are averages over many cells with possibly different states. Via the central-limit theorem, we propose another model for expression data in multiple-cell scenarios. Simulations demonstrate that single-cell trajectory data can outperform multiple-cell average expression data relative to classification error, especially in high-noise situations. We also consider data generated via a mammalian cell-cycle network, both the wild-type and with a common mutation affecting p27.


Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Análise de Célula Única/métodos , Algoritmos , Animais , Teorema de Bayes , Humanos , Modelos Genéticos , Modelos Estatísticos , Neoplasias/genética , Neoplasias/metabolismo
8.
BMC Syst Biol ; 12(Suppl 3): 23, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29589564

RESUMO

BACKGROUND: Expression-based phenotype classification using either microarray or RNA-Seq measurements suffers from a lack of specificity because pathway timing is not revealed and expressions are averaged across groups of cells. This paper studies expression-based classification under the assumption that single-cell measurements are sampled at a sufficient rate to detect regulatory timing. Thus, observations are expression trajectories. In effect, classification is performed on data generated by an underlying gene regulatory network. RESULTS: Network regulation is modeled via a Boolean network with perturbation, regulation not fully determined owing to inherent biological randomness. The binary assumption is not critical because the resulting Markov chain characterizes expression trajectories. We assume a partially known Gaussian observation model belonging to an uncertainty class of models. We derive the intrinsically Bayesian robust classifier to discriminate between wild-type and mutated networks based on expression trajectories. The classifier minimizes the expected error across the uncertainty class relative to the prior distribution. We test it using a mammalian cell-cycle model, discriminating between the normal network and one in which gene p27 is mutated, thereby producing a cancerous phenotype. Tests examine all model aspects, including trajectory length, perturbation probability, and the hyperparameters governing the prior distribution over the uncertainty class. CONCLUSIONS: Simulations show the rates at which the expected error is diminished by smaller perturbation probability, longer trajectories, and hyperparameters that tighten the prior distribution relative to the unknown true network. For average-expression measurement, methods have been proposed to obtain prior distributions. These should be extended to the more mathematically difficult, but more informative, expression trajectories.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Análise de Célula Única , Teorema de Bayes , Modelos Genéticos , Processos Estocásticos
9.
Artigo em Inglês | MEDLINE | ID: mdl-27740496

RESUMO

Gene-expression-based phenotype classification is used for disease diagnosis and prognosis relating to treatment strategies. The present paper considers classification based on sequential measurements of multiple genes using gene regulatory network (GRN) modeling. There are two networks, original and mutated, and observations consist of trajectories of network states. The problem is to classify an observation trajectory as coming from either the original or mutated network. GRNs are modeled via probabilistic Boolean networks, which incorporate stochasticity at both the gene and network levels. Mutation affects the regulatory logic. Classification is based upon observing a trajectory of states of some given length. We characterize the Bayes classifier and find the Bayes error for a general PBN and the special case of a single Boolean network affected by random perturbations (BNp). The Bayes error is related to network sensitivity, meaning the extent of alteration in the steady-state distribution of the original network owing to mutation. Using standard methods to calculate steady-state distributions is cumbersome and sometimes impossible, so we provide an efficient algorithm and approximations. Extensive simulations are performed to study the effects of various factors, including approximation accuracy. We apply the classification procedure to a p53 BNp and a mammalian cell cycle PBN.


Assuntos
Redes Reguladoras de Genes/genética , Modelos Estatísticos , Algoritmos , Biologia Computacional , Perfilação da Expressão Gênica , Genes p53/genética , Humanos , Modelos Genéticos , Neoplasias/genética , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...