Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 19(4): e0298906, 2024.
Article in English | MEDLINE | ID: mdl-38625909

ABSTRACT

Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.


Subject(s)
Epistasis, Genetic , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Phenotype , Multifactorial Inheritance/genetics , Logistic Models , Polymorphism, Single Nucleotide
2.
bioRxiv ; 2024 Apr 06.
Article in English | MEDLINE | ID: mdl-37873118

ABSTRACT

Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, OMIM), experimental functional assays (DMS, DepMap), and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants.

3.
Genome Biol ; 24(1): 182, 2023 08 07.
Article in English | MEDLINE | ID: mdl-37550700

ABSTRACT

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.


Subject(s)
Machine Learning , Proteome , Humans , Proteome/genetics , Amino Acid Sequence , Mutation , Mutation, Missense , Computational Biology/methods
4.
Nat Commun ; 11(1): 651, 2020 01 31.
Article in English | MEDLINE | ID: mdl-32005835

ABSTRACT

While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.


Subject(s)
Cells/metabolism , Gene Expression Profiling/methods , Membrane Proteins/genetics , Single-Cell Analysis/methods , Transcriptome , Cells/cytology , Humans , Membrane Proteins/metabolism , Neural Networks, Computer , Sequence Analysis, RNA
5.
Nat Methods ; 16(9): 875-878, 2019 09.
Article in English | MEDLINE | ID: mdl-31471617

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) data are noisy and sparse. Here, we show that transfer learning across datasets remarkably improves data quality. By coupling a deep autoencoder with a Bayesian model, SAVER-X extracts transferable gene-gene relationships across data from different labs, varying conditions and divergent species, to denoise new target datasets.


Subject(s)
Breast Neoplasms/metabolism , Computational Biology/methods , Leukocytes, Mononuclear/metabolism , Sequence Analysis, RNA/standards , Single-Cell Analysis/methods , T-Lymphocytes/metabolism , Transcriptome , Animals , Bayes Theorem , Female , Gene Expression Profiling , Gene Expression Regulation , Humans , Mice , Sequence Analysis, RNA/methods
6.
Bioinformatics ; 35(24): 5155-5162, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31197307

ABSTRACT

MOTIVATION: Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. RESULTS: We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Single-Cell Analysis , Software , Gene Expression Profiling , RNA-Seq , Sequence Analysis, RNA
7.
Nat Med ; 24(12): 1941, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30135555

ABSTRACT

In the version of this article originally published, the institution in affiliation 10 was missing. Affiliation 10 was originally listed as Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, Melbourne, Victoria, Australia. It should have been Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, University of Melbourne, Melbourne, Victoria, Australia. The error has been corrected in the HTML and PDF versions of this article.

8.
Nat Med ; 24(7): 986-993, 2018 07.
Article in English | MEDLINE | ID: mdl-29942092

ABSTRACT

The quantity of tumor-infiltrating lymphocytes (TILs) in breast cancer (BC) is a robust prognostic factor for improved patient survival, particularly in triple-negative and HER2-overexpressing BC subtypes1. Although T cells are the predominant TIL population2, the relationship between quantitative and qualitative differences in T cell subpopulations and patient prognosis remains unknown. We performed single-cell RNA sequencing (scRNA-seq) of 6,311 T cells isolated from human BCs and show that significant heterogeneity exists in the infiltrating T cell population. We demonstrate that BCs with a high number of TILs contained CD8+ T cells with features of tissue-resident memory T (TRM) cell differentiation and that these CD8+ TRM cells expressed high levels of immune checkpoint molecules and effector proteins. A CD8+ TRM gene signature developed from the scRNA-seq data was significantly associated with improved patient survival in early-stage triple-negative breast cancer (TNBC) and provided better prognostication than CD8 expression alone. Our data suggest that CD8+ TRM cells contribute to BC immunosurveillance and are the key targets of modulation by immune checkpoint inhibition. Further understanding of the development, maintenance and regulation of TRM cells will be crucial for successful immunotherapeutic development in BC.


Subject(s)
Breast Neoplasms/immunology , Immunologic Memory , Single-Cell Analysis/methods , Breast Neoplasms/pathology , CD3 Complex/metabolism , CD8 Antigens/metabolism , Disease-Free Survival , Female , Humans , Kaplan-Meier Estimate , Lymphocytes, Tumor-Infiltrating/immunology , Prognosis , Sequence Analysis, RNA , Triple Negative Breast Neoplasms/immunology , Triple Negative Breast Neoplasms/pathology
SELECTION OF CITATIONS
SEARCH DETAIL
...