Pesquisa | Portal Regional da BVS (teste)

1.

CT-Based AI model for predicting therapeutic outcomes in ureteral stones after single extracorporeal shock wave lithotripsy through a cohort study.

Yang, Huancheng; Wu, Xiang; Liu, Weihao; Yang, Zhong; Wang, Tianyu; You, Weifan; Ye, Baiwei; Wu, Bingni; Wu, Kai; Zeng, Haoyang; Liu, Hanlin.

Int J Surg ; 2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38884274

RESUMO

OBJECTIVES: Exploring the efficacy of an artificial intelligence (AI) model derived from the analysis of CT images to precisely forecast the therapeutic outcomes of singular-session extracorporeal shock wave lithotripsy (ESWL) in the management of ureteral stones. METHODS: A total of 317 patients diagnosed clinically with ureteral stones were included in this investigation. Unenhanced CT was administered to the participants within the initial fortnight preceding the inaugural ESWL. The internal cohort consisted of 250 individuals from a local healthcare facility, whereas the external cohort comprised 67 participants from another local medical institution. The proposed framework comprises three main components: an automated semantic segmentation model developed using 3D U-Net, a feature extractor that integrates radiomics and autoencoder techniques, and an ESWL efficacy prediction model trained with various machine learning algorithms. All participants underwent thorough postoperative follow-up examinations four weeks hence. The efficacy of ESWL was defined by the absence of stones or residual fragments measuring ≤2 mm in KUB X-ray assessments. Model stability and generalizability were judiciously validated through a fivefold cross-validation approach and a multi-center external test strategy. Moreover, Shapley Additive Explanations (SHAP) values for individual features were computed to elucidate the nuanced contributions of each feature to the model's decision-making process. RESULTS: The semantic segmentation model we constructed exhibited an average Dice coefficient of 0.88 ± 0.08 on the external testing set. ESWL classifiers built using Support Vector Machine (SVM), Random Forest (RF), XGBoost (XB), and CatBoost (CB) achieved AUROC values of 0.78, 0.84, 0.85, and 0.90, respectively, on the internal validation set. For the external testing set, SVM, RF, XB, and CB predicted ESWL with AUROC values of 0.68, 0.79, 0.80, and 0.83, respectively, with the last one being the optimal algorithm. The radiomics features and auto-encoder features made significant contributions to the decision-making process of the classification model. CONCLUSIONS: This investigation unmistakably underscores the remarkable predictive prowess exhibited by a scrupulously crafted AI model using CT images to precisely anticipate the therapeutic results of a singular session of extracorporeal shock wave lithotripsy for ureteral stones.

2.

A quantitative analysis framework of placenta accreta spectrum: placenta subtype, intraoperative bleeding, and hysterectomy risk evaluation based on magnetic resonance imaging-anatomical-clinical features.

Yang, Huancheng; Wu, Xiang; Liu, Weihao; Yuan, Yangguang; Zeng, Haoyang; Li, Junkai; Ye, Baiwei; Wang, Lei; Luo, Shimei; Li, Zhe; Liu, Hanlin.

Quant Imaging Med Surg ; 13(10): 7105-7116, 2023 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-37869322

RESUMO

Background: Placenta accreta spectrum (PAS) is a significant contributor to maternal morbidity and mortality. Our objective was to develop a quantitative analysis framework utilizing magnetic resonance imaging (MRI)-anatomical-clinical features to predict 3 clinically significant parameters in patients with PAS: placenta subtype (invasive vs. non-invasive placenta), intraoperative bleeding (≥1,500 vs. <1,500 mL), and hysterectomy risk (hysterectomy vs. non-hysterectomy). Methods: A total of 125 pregnant women with PAS from 2 medical centers were enrolled into an internal training set and an external testing set. Some 21 MRI-anatomical-clinical features were integrated as input into the framework. The proposed quantitative analytic framework contains mainly 3 classifiers built by extreme gradient boosting (XGBoost) and their testing in external datasets. We also further compared the accuracy of placenta subtype prediction between the proposed model and 4 radiologists. A quantitative model interpretation method called SHapley Additive exPlanations (SHAP) was conducted to explore the contribution of each feature. Results: The placenta subtype (invasive vs. non-invasive), intraoperative bleeding (≥1,500 vs. <1,500 mL), and hysterectomy risk (hysterectomy vs. non-hysterectomy) demonstrated impressive area under the receiver operating characteristic curve (AUROC) values of 0.93, 0.88, and 0.90, respectively, in the internal validation set. Even in the external testing set, these metrics maintained their strength, achieving AUROC values of 0.91, 0.82, and 0.82, respectively. Comparing our proposed framework to the 4 radiologists, our model exhibited superior accuracy, specificity, and sensitivity in predicting placental subtypes within the external testing cohort. The features associated with intraplacental dark T2 bands played a crucial role in the decision-making process of all 3 prediction models. Conclusions: The quantitative analysis framework can provide a robust method for classification of placenta subtype (invasive vs. non-invasive placenta), intraoperative bleeding (≥1,500 vs. <1,500 mL), and hysterectomy risk (hysterectomy vs. non-hysterectomy) based on MRI-anatomical-clinical features in PAS.

3.

An automated surgical decision-making framework for partial or radical nephrectomy based on 3D-CT multi-level anatomical features in renal cell carcinoma.

Yang, Huancheng; Wu, Kai; Liu, Hanlin; Wu, Peng; Yuan, Yangguang; Wang, Lei; Liu, Yaru; Zeng, Haoyang; Li, Junkai; Liu, Weihao; Wu, Song.

Eur Radiol ; 33(11): 7532-7541, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37289245

RESUMO

OBJECTIVES: To determine whether 3D-CT multi-level anatomical features can provide a more accurate prediction of surgical decision-making for partial or radical nephrectomy in renal cell carcinoma. METHODS: This is a retrospective study based on multi-center cohorts. A total of 473 participants with pathologically proved renal cell carcinoma were split into the internal training and the external testing set. The training set contains 412 cases from five open-source cohorts and two local hospitals. The external testing set includes 61 participants from another local hospital. The proposed automatic analytic framework contains the following modules: a 3D kidney and tumor segmentation model constructed by 3D-UNet, a multi-level feature extractor based on the region of interest, and a partial or radical nephrectomy prediction classifier by XGBoost. The fivefold cross-validation strategy was used to get a robust model. A quantitative model interpretation method called the Shapley Additive Explanations was conducted to explore the contribution of each feature. RESULTS: In the prediction of partial versus radical nephrectomy, the combination of multi-level features achieved better performance than any single-level feature. For the internal validation, the AUROC was 0.93 ± 0.1, 0.94 ± 0.1, 0.93 ± 0.1, 0.93 ± 0.1, and 0.93 ± 0.1, respectively, as determined by the fivefold cross-validation. The AUROC from the optimal model was 0.82 ± 0.1 in the external testing set. The tumor shape Maximum 3D Diameter plays the most vital role in the model decision. CONCLUSIONS: The automated surgical decision framework for partial or radical nephrectomy based on 3D-CT multi-level anatomical features exhibits robust performance in renal cell carcinoma. The framework points the way towards guiding surgery through medical images and machine learning. CLINICAL RELEVANCE STATEMENT: We proposed an automated analytic framework that can assist surgeons in partial or radical nephrectomy decision-making. The framework points the way towards guiding surgery through medical images and machine learning. KEY POINTS: â¢ The 3D-CT multi-level anatomical features provide a more accurate prediction of surgical decision-making for partial or radical nephrectomy in renal cell carcinoma. â¢ The data from multicenter study and a strict fivefold cross-validation strategy, both internal validation set and external testing set, can be easily transferred to different tasks of new datasets. â¢ The quantitative decomposition of the prediction model was conducted to explore the contribution of each extracted feature.

Assuntos

Carcinoma de Células Renais , Neoplasias Renais , Humanos , Carcinoma de Células Renais/diagnóstico por imagem , Carcinoma de Células Renais/cirurgia , Carcinoma de Células Renais/patologia , Neoplasias Renais/diagnóstico por imagem , Neoplasias Renais/cirurgia , Neoplasias Renais/patologia , Estudos Retrospectivos , Nefrectomia/métodos , Tomografia Computadorizada por Raios X/métodos

4.

Machine learning optimization of peptides for presentation by class II MHCs.

Dai, Zheng; Huisman, Brooke D; Zeng, Haoyang; Carter, Brandon; Jain, Siddhartha; Birnbaum, Michael E; Gifford, David K.

Bioinformatics ; 37(19): 3160-3167, 2021 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-33705522

RESUMO

SUMMARY: T cells play a critical role in cellular immune responses to pathogens and cancer and can be activated and expanded by Major Histocompatibility Complex (MHC)-presented antigens contained in peptide vaccines. We present a machine learning method to optimize the presentation of peptides by class II MHCs by modifying their anchor residues. Our method first learns a model of peptide affinity for a class II MHC using an ensemble of deep residual networks, and then uses the model to propose anchor residue changes to improve peptide affinity. We use a high throughput yeast display assay to show that anchor residue optimization improves peptide binding. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.

NSUN5 Facilitates Viral RNA Recognition by RIG-I Receptor.

Sun, Boyue; Zeng, Haoyang; Liang, Jiaqian; Zhang, Lele; Hu, Haiyang; Wang, Quanyi; Meng, Wei; Li, Chenhui; Ye, Fuqiang; Wang, Chen; Zhu, Juanjuan.

J Immunol ; 205(12): 3408-3418, 2020 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-33177158

RESUMO

The RIG-I receptor induces the innate antiviral responses upon sensing RNA viruses. The mechanisms through which RIG-I optimizes the strength of the downstream signaling remain incompletely understood. In this study, we identified that NSUN5 could potentiate the RIG-I innate signaling pathway. Deficiency of NSUN5 enhanced RNA virus proliferation and inhibited the induction of the downstream antiviral genes. Consistently, NSUN5-deficient mice were more susceptible to RNA virus infection than their wild-type littermates. Mechanistically, NSUN5 bound directly to both viral RNA and RIG-I, synergizing the recognition of dsRNA by RIG-I. Collectively, to our knowledge, this study characterized NSUN5 as a novel RIG-I coreceptor, playing a vital role in restricting RNA virus infection.

Assuntos

Proteína DEAD-box 58/imunologia , Metiltransferases/imunologia , Proteínas Musculares/imunologia , Infecções por Vírus de RNA/imunologia , Vírus de RNA/imunologia , RNA de Cadeia Dupla/imunologia , RNA Viral/imunologia , Receptores Imunológicos/imunologia , tRNA Metiltransferases/imunologia , Animais , Chlorocebus aethiops , Células HEK293 , Humanos , Imunidade Inata , Camundongos , Células Vero

6.

Antibody complementarity determining region design using high-capacity machine learning.

Liu, Ge; Zeng, Haoyang; Mueller, Jonas; Carter, Brandon; Wang, Ziheng; Schilz, Jonas; Horny, Geraldine; Birnbaum, Michael E; Ewert, Stefan; Gifford, David K.

Bioinformatics ; 36(7): 2126-2133, 2020 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31778140

RESUMO

MOTIVATION: The precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. RESULTS: Here, we present Ens-Grad, a machine learning method that can design complementarity determining regions of human Immunoglobulin G antibodies with target affinities that are superior to candidates derived from phage display panning experiments. We also demonstrate that machine learning can improve target specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data. AVAILABILITY AND IMPLEMENTATION: Sequencing data of the phage panning experiment are deposited at NIH's Sequence Read Archive (SRA) under the accession number SRP158510. We make our code available at https://github.com/gifford-lab/antibody-2019. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Regiões Determinantes de Complementaridade , Aprendizado de Máquina , Anticorpos , Humanos

7.

DeepLigand: accurate prediction of MHC class I ligands using peptide embedding.

Zeng, Haoyang; Gifford, David K.

Bioinformatics ; 35(14): i278-i283, 2019 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-31510651

RESUMO

MOTIVATION: The computational modeling of peptide display by class I major histocompatibility complexes (MHCs) is essential for peptide-based therapeutics design. Existing computational methods for peptide-display focus on modeling the peptide-MHC-binding affinity. However, such models are not able to characterize the sequence features for the other cellular processes in the peptide display pathway that determines MHC ligand selection. RESULTS: We introduce a semi-supervised model, DeepLigand that outperforms the state-of-the-art models in MHC Class I ligand prediction. DeepLigand combines a peptide language model and peptide binding affinity prediction to score MHC class I peptide presentation. The peptide language model characterizes sequence features that correspond to secondary factors in MHC ligand selection other than binding affinity. The peptide embedding is learned by pre-training on natural ligands, and can discriminate between ligands and non-ligands in the absence of binding affinity prediction. Although conventional affinity-based models fail to classify peptides with moderate affinities, DeepLigand discriminates ligands from non-ligands with consistently high accuracy. AVAILABILITY AND IMPLEMENTATION: We make DeepLigand available at https://github.com/gifford-lab/DeepLigand. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Peptídeos/análise , Antígenos de Histocompatibilidade Classe I , Ligantes , Ligação Proteica , Software

8.

Visualizing complex feature interactions and feature sharing in genomic deep neural networks.

Liu, Ge; Zeng, Haoyang; Gifford, David K.

BMC Bioinformatics ; 20(1): 401, 2019 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-31324140

RESUMO

BACKGROUND: Visualization tools for deep learning models typically focus on discovering key input features without considering how such low level features are combined in intermediate layers to make decisions. Moreover, many of these methods examine a network's response to specific input examples that may be insufficient to reveal the complexity of model decision making. RESULTS: We present DeepResolve, an analysis framework for deep convolutional models of genome function that visualizes how input features contribute individually and combinatorially to network decisions. Unlike other methods, DeepResolve does not depend upon the analysis of a predefined set of inputs. Rather, it uses gradient ascent to stochastically explore intermediate feature maps to 1) discover important features, 2) visualize their contribution and interaction patterns, and 3) analyze feature sharing across tasks that suggests shared biological mechanism. We demonstrate the visualization of decision making using our proposed method on deep neural networks trained on both experimental and synthetic data. DeepResolve is competitive with existing visualization tools in discovering key sequence features, and identifies certain negative features and non-additive feature interactions that are not easily observed with existing tools. It also recovers similarities between poorly correlated classes which are not observed by traditional methods. DeepResolve reveals that DeepSEA's learned decision structure is shared across genome annotations including histone marks, DNase hypersensitivity, and transcription factor binding. We identify groups of TFs that suggest known shared biological mechanism, and recover correlation between DNA hypersensitivities and TF/Chromatin marks. CONCLUSIONS: DeepResolve is capable of visualizing complex feature contribution patterns and feature interactions that contribute to decision making in genomic deep convolutional networks. It also recovers feature sharing and class similarities which suggest interesting biological mechanisms. DeepResolve is compatible with existing visualization tools and provides complementary insights.

Assuntos

Algoritmos , Aprendizado Profundo , Genômica , Redes Neurais de Computação , Sequência de Bases , Bases de Dados Genéticas , Código das Histonas , Histonas/metabolismo , Fatores de Transcrição/metabolismo

9.

Quantification of Uncertainty in Peptide-MHC Binding Prediction Improves High-Affinity Peptide Selection for Therapeutic Design.

Zeng, Haoyang; Gifford, David K.

Cell Syst ; 9(2): 159-166.e3, 2019 08 28.

Artigo em Inglês | MEDLINE | ID: mdl-31176619

RESUMO

The computational identification of peptides that can bind the major histocompatibility complex (MHC) with high affinity is an essential step in developing personal immunotherapies and vaccines. We introduce PUFFIN, a deep residual network-based computational approach that quantifies uncertainty in peptide-MHC affinity prediction that arises from observational noise and the lack of relevant training examples. With PUFFIN's uncertainty metrics, we define binding likelihood, the probability a peptide binds to a given MHC allele at a specified affinity threshold. Compared to affinity point estimates, we find that binding likelihood correlates better with the observed affinity and reduces false positives in high-affinity peptide design. When applied to examine an existing peptide vaccine, PUFFIN identifies an alternative vaccine formulation with higher binding likelihood. PUFFIN is freely available for download at http://github.com/gifford-lab/PUFFIN.

Assuntos

Biologia Computacional/métodos , Complexo Principal de Histocompatibilidade/fisiologia , Ligação Proteica/fisiologia , Algoritmos , Bases de Dados de Proteínas , Antígenos de Histocompatibilidade Classe I/genética , Humanos , Peptídeos/metabolismo , Software , Incerteza

10.

A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

Guo, Yuchun; Tian, Kevin; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David Kenneth.

Genome Res ; 28(6): 891-900, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29654070

RESUMO

The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.

Assuntos

Redes Reguladoras de Genes/genética , Ligação Proteica/genética , Locos de Características Quantitativas/genética , Fatores de Transcrição/genética , Algoritmos , Sítios de Ligação/genética , Imunoprecipitação da Cromatina/métodos , Biologia Computacional , Humanos , Matrizes de Pontuação de Posição Específica

11.

Predicting the impact of non-coding variants on DNA methylation.

Zeng, Haoyang; Gifford, David K.

Nucleic Acids Res ; 45(11): e99, 2017 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-28334830

RESUMO

DNA methylation plays a crucial role in the establishment of tissue-specific gene expression and the regulation of key biological processes. However, our present inability to predict the effect of genome sequence variation on DNA methylation precludes a comprehensive assessment of the consequences of non-coding variation. We introduce CpGenie, a sequence-based framework that learns a regulatory code of DNA methylation using a deep convolutional neural network and uses this network to predict the impact of sequence variation on proximal CpG site DNA methylation. CpGenie produces allele-specific DNA methylation prediction with single-nucleotide sensitivity that enables accurate prediction of methylation quantitative trait loci (meQTL). We demonstrate that CpGenie prioritizes validated GWAS SNPs, and contributes to the prediction of functional non-coding variants, including expression quantitative trait loci (eQTL) and disease-associated mutations. CpGenie is publicly available to assist in identifying and interpreting regulatory non-coding variants.

Assuntos

Metilação de DNA , DNA Intergênico/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Sítios de Ligação , Sequência Consenso , Epigênese Genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Modelos Genéticos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas

12.

Predicting gene expression in massively parallel reporter assays: A comparative study.

Kreimer, Anat; Zeng, Haoyang; Edwards, Matthew D; Guo, Yuchun; Tian, Kevin; Shin, Sunyoung; Welch, Rene; Wainberg, Michael; Mohan, Rahul; Sinnott-Armstrong, Nicholas A; Li, Yue; Eraslan, Gökcen; Amin, Talal Bin; Tewhey, Ryan; Sabeti, Pardis C; Goke, Jonathan; Mueller, Nikola S; Kellis, Manolis; Kundaje, Anshul; Beer, Michael A; Keles, Sunduz; Gifford, David K; Yosef, Nir.

Hum Mutat ; 38(9): 1240-1250, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28220625

RESUMO

In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role "coded" in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies.

Assuntos

Biologia Computacional/métodos , Expressão Gênica , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Locos de Características Quantitativas

13.

Accurate eQTL prioritization with an ensemble-based framework.

Zeng, Haoyang; Edwards, Matthew D; Guo, Yuchun; Gifford, David K.

Hum Mutat ; 38(9): 1259-1265, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28224684

RESUMO

We present a novel ensemble-based computational framework, EnsembleExpr, that achieved the best performance in the Fourth Critical Assessment of Genome Interpretation expression quantitative trait locus "(eQTL)-causal SNPs" challenge for identifying eQTLs and prioritizing their gene expression effects. eQTLs are genome sequence variants that result in gene expression changes and are thus prime suspects in the search for contributions to the causality of complex traits. When EnsembleExpr is trained on data from massively parallel reporter assays, it accurately predicts reporter expression levels from unseen regulatory sequences and identifies sequence variants that exhibit significant changes in reporter expression. Compared with other state-of-the-art methods, EnsembleExpr achieved competitive performance when applied on eQTL datasets determined by other protocols. We envision EnsembleExpr to be a resource to help interpret noncoding regulatory variants and prioritize disease-associated mutations for downstream validation.

Assuntos

Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Modelos Genéticos , Mutação , Software

14.

A synergistic DNA logic predicts genome-wide chromatin accessibility.

Hashimoto, Tatsunori; Sherwood, Richard I; Kang, Daniel D; Rajagopal, Nisha; Barkal, Amira A; Zeng, Haoyang; Emons, Bart J M; Srinivasan, Sharanya; Jaakkola, Tommi; Gifford, David K.

Genome Res ; 26(10): 1430-1440, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-27456004

RESUMO

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.

Assuntos

Montagem e Desmontagem da Cromatina , Cromatina/genética , Modelos Genéticos , Animais , Cromatina/metabolismo , Genoma Humano , Humanos , Aprendizado de Máquina

15.

Convolutional neural network architectures for predicting DNA-protein binding.

Zeng, Haoyang; Edwards, Matthew D; Liu, Ge; Gifford, David K.

Bioinformatics ; 32(12): i121-i127, 2016 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-27307608

RESUMO

MOTIVATION: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. RESULTS: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. AVAILABILITY AND IMPLEMENTATION: All the models analyzed are available at http://cnn.csail.mit.edu CONTACT: gifford@mit.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Neurais de Computação , Algoritmos , DNA , Ligação Proteica , Proteínas

16.

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.

Zeng, Haoyang; Hashimoto, Tatsunori; Kang, Daniel D; Gifford, David K.

Bioinformatics ; 32(4): 490-6, 2016 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-26476779

RESUMO

MOTIVATION: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies. RESULTS: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis. AVAILABILITY AND IMPLEMENTATION: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.

Assuntos

Algoritmos , Biologia Computacional/métodos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , Imunoprecipitação da Cromatina , Genoma Humano , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Ligação Proteica

17.

Abundant contribution of short tandem repeats to gene expression variation in humans.

Gymrek, Melissa; Willems, Thomas; Guilmatre, Audrey; Zeng, Haoyang; Markus, Barak; Georgiev, Stoyan; Daly, Mark J; Price, Alkes L; Pritchard, Jonathan K; Sharp, Andrew J; Erlich, Yaniv.

Nat Genet ; 48(1): 22-9, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26642241

RESUMO

The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.

Assuntos

Expressão Gênica , Variação Genética , Genoma Humano , Repetições de Microssatélites , Doença de Crohn/genética , Estudo de Associação Genômica Ampla , Histonas/genética , Histonas/metabolismo , Humanos , Mutação INDEL , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sequências Reguladoras de Ácido Nucleico , Gêmeos/genética

18.

Mining TCGA data using Boolean implications.

Sinha, Subarna; Tsang, Emily K; Zeng, Haoyang; Meister, Michela; Dill, David L.

PLoS One ; 9(7): e102119, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25054200

RESUMO

Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we propose to use Boolean implications to find relationships between variables of different data types (mutation, copy number alteration, DNA methylation and gene expression) from the glioblastoma (GBM) and ovarian serous cystadenoma (OV) data sets from The Cancer Genome Atlas (TCGA). We find hundreds of thousands of Boolean implications from these data sets. A direct comparison of the relationships found by Boolean implications and those found by commonly used methods for mining associations show that existing methods would miss relationships found by Boolean implications. Furthermore, many relationships exposed by Boolean implications reflect important aspects of cancer biology. Examples of our findings include cis relationships between copy number alteration, DNA methylation and expression of genes, a new hierarchy of mutations and recurrent copy number alterations, loss-of-heterozygosity of well-known tumor suppressors, and the hypermethylation phenotype associated with IDH1 mutations in GBM. The Boolean implication results used in the paper can be accessed at http://crookneck.stanford.edu/microarray/TCGANetworks/.

Assuntos

Neoplasias Encefálicas/genética , Biologia Computacional/métodos , Cistadenoma Seroso/genética , Mineração de Dados/métodos , Glioblastoma/genética , Neoplasias Ovarianas/genética , Variações do Número de Cópias de DNA , Metilação de DNA , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Internet , Mutação , Reprodutibilidade dos Testes

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA