Pesquisa | Portal Regional da BVS (teste)

Learning epistatic polygenic phenotypes with Boolean interactions.

Behr, Merle; Kumbier, Karl; Cordova-Palomera, Aldo; Aguirre, Matthew; Ronen, Omer; Ye, Chengzhong; Ashley, Euan; Butte, Atul J; Arnaout, Rima; Brown, Ben; Priest, James; Yu, Bin.

PLoS One ; 19(4): e0298906, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38625909

RESUMO

Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.

Assuntos

Epistasia Genética , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Herança Multifatorial/genética , Modelos Logísticos , Polimorfismo de Nucleotídeo Único

GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction.

Benegas, Gonzalo; Albors, Carlos; Aw, Alan J; Ye, Chengzhong; Song, Yun S.

bioRxiv ; 2024 Apr 06.

Artigo em Inglês | MEDLINE | ID: mdl-37873118

RESUMO

Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, OMIM), experimental functional assays (DMS, DepMap), and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants.

Cross-protein transfer learning substantially improves disease variant prediction.

Jagota, Milind; Ye, Chengzhong; Albors, Carlos; Rastogi, Ruchir; Koehl, Antoine; Ioannidis, Nilah; Song, Yun S.

Genome Biol ; 24(1): 182, 2023 08 07.

Artigo em Inglês | MEDLINE | ID: mdl-37550700

RESUMO

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.

Assuntos

Aprendizado de Máquina , Proteoma , Humanos , Proteoma/genética , Sequência de Aminoácidos , Mutação , Mutação de Sentido Incorreto , Biologia Computacional/métodos

Surface protein imputation from single cell transcriptomes by deep neural networks.

Zhou, Zilu; Ye, Chengzhong; Wang, Jingshu; Zhang, Nancy R.

Nat Commun ; 11(1): 651, 2020 01 31.

Artigo em Inglês | MEDLINE | ID: mdl-32005835

RESUMO

While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.

Assuntos

Células/metabolismo , Perfilação da Expressão Gênica/métodos , Proteínas de Membrana/genética , Análise de Célula Única/métodos , Transcriptoma , Células/citologia , Humanos , Proteínas de Membrana/metabolismo , Redes Neurais de Computação , Análise de Sequência de RNA

Data denoising with transfer learning in single-cell transcriptomics.

Wang, Jingshu; Agarwal, Divyansh; Huang, Mo; Hu, Gang; Zhou, Zilu; Ye, Chengzhong; Zhang, Nancy R.

Nat Methods ; 16(9): 875-878, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31471617

RESUMO

Single-cell RNA sequencing (scRNA-seq) data are noisy and sparse. Here, we show that transfer learning across datasets remarkably improves data quality. By coupling a deep autoencoder with a Bayesian model, SAVER-X extracts transferable gene-gene relationships across data from different labs, varying conditions and divergent species, to denoise new target datasets.

Assuntos

Neoplasias da Mama/metabolismo , Biologia Computacional/métodos , Leucócitos Mononucleares/metabolismo , Análise de Sequência de RNA/normas , Análise de Célula Única/métodos , Linfócitos T/metabolismo , Transcriptoma , Animais , Teorema de Bayes , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Camundongos , Análise de Sequência de RNA/métodos

DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data.

Ye, Chengzhong; Speed, Terence P; Salim, Agus.

Bioinformatics ; 35(24): 5155-5162, 2019 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-31197307

RESUMO

MOTIVATION: Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. RESULTS: We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Análise de Célula Única , Software , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA

Publisher Correction: Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis.

Savas, Peter; Virassamy, Balaji; Ye, Chengzhong; Salim, Agus; Mintoff, Christopher P; Caramia, Franco; Salgado, Roberto; Byrne, David J; Teo, Zhi L; Dushyanthen, Sathana; Byrne, Ann; Wein, Lironne; Luen, Stephen J; Poliness, Catherine; Nightingale, Sophie S; Skandarajah, Anita S; Gyorki, David E; Thornton, Chantel M; Beavis, Paul A; Fox, Stephen B; Darcy, Phillip K; Speed, Terence P; Mackay, Laura K; Neeson, Paul J; Loi, Sherene.

Nat Med ; 24(12): 1941, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-30135555

RESUMO

In the version of this article originally published, the institution in affiliation 10 was missing. Affiliation 10 was originally listed as Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, Melbourne, Victoria, Australia. It should have been Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, University of Melbourne, Melbourne, Victoria, Australia. The error has been corrected in the HTML and PDF versions of this article.

Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis.

Nat Med ; 24(7): 986-993, 2018 07.

Artigo em Inglês | MEDLINE | ID: mdl-29942092

RESUMO

The quantity of tumor-infiltrating lymphocytes (TILs) in breast cancer (BC) is a robust prognostic factor for improved patient survival, particularly in triple-negative and HER2-overexpressing BC subtypes1. Although T cells are the predominant TIL population2, the relationship between quantitative and qualitative differences in T cell subpopulations and patient prognosis remains unknown. We performed single-cell RNA sequencing (scRNA-seq) of 6,311 T cells isolated from human BCs and show that significant heterogeneity exists in the infiltrating T cell population. We demonstrate that BCs with a high number of TILs contained CD8+ T cells with features of tissue-resident memory T (TRM) cell differentiation and that these CD8+ TRM cells expressed high levels of immune checkpoint molecules and effector proteins. A CD8+ TRM gene signature developed from the scRNA-seq data was significantly associated with improved patient survival in early-stage triple-negative breast cancer (TNBC) and provided better prognostication than CD8 expression alone. Our data suggest that CD8+ TRM cells contribute to BC immunosurveillance and are the key targets of modulation by immune checkpoint inhibition. Further understanding of the development, maintenance and regulation of TRM cells will be crucial for successful immunotherapeutic development in BC.

Assuntos

Neoplasias da Mama/imunologia , Memória Imunológica , Análise de Célula Única/métodos , Neoplasias da Mama/patologia , Complexo CD3/metabolismo , Antígenos CD8/metabolismo , Intervalo Livre de Doença , Feminino , Humanos , Estimativa de Kaplan-Meier , Linfócitos do Interstício Tumoral/imunologia , Prognóstico , Análise de Sequência de RNA , Neoplasias de Mama Triplo Negativas/imunologia , Neoplasias de Mama Triplo Negativas/patologia

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA