Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
bioRxiv ; 2024 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-38948874

RESUMO

Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo to disease-associated cells. One limitation to their widespread use is the lack of short regulatory sequences, or promoters, that differentially induce the expression of delivered genetic cargo in target cells, minimizing side effects in other cell types. Such cell-type-specific promoters are difficult to discover using existing methods, requiring either manual curation or access to large datasets of promoter-driven expression from both targeted and untargeted cells. Model-based optimization (MBO) has emerged as an effective method to design biological sequences in an automated manner, and has recently been used in promoter design methods. However, these methods have only been tested using large training datasets that are expensive to collect, and focus on designing promoters for markedly different cell types, overlooking the complexities associated with designing promoters for closely related cell types that share similar regulatory features. Therefore, we introduce a comprehensive framework for utilizing MBO to design promoters in a data-efficient manner, with an emphasis on discovering promoters for similar cell types. We use conservative objective models (COMs) for MBO and highlight practical considerations such as best practices for improving sequence diversity, getting estimates of model uncertainty, and choosing the optimal set of sequences for experimental validation. Using three relatively similar blood cancer cell lines (Jurkat, K562, and THP1), we show that our approach discovers many novel cell-type-specific promoters after experimentally validating the designed sequences. For K562 cells, in particular, we discover a promoter that has 75.85% higher cell-type-specificity than the best promoter from the initial dataset used to train our models.

2.
bioRxiv ; 2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-38948875

RESUMO

Kidney disease is highly heritable; however, the causal genetic variants, the cell types in which these variants function, and the molecular mechanisms underlying kidney disease remain largely unknown. To identify genetic loci affecting kidney function, we performed a GWAS using multiple kidney function biomarkers and identified 462 loci. To begin to investigate how these loci affect kidney function, we generated single-cell chromatin accessibility (scATAC-seq) maps of the human kidney and identified candidate cis -regulatory elements (cCREs) for kidney podocytes, tubule epithelial cells, and kidney endothelial, stromal, and immune cells. Kidney tubule epithelial cCREs explained 58% of kidney function SNP-heritability and kidney podocyte cCREs explained an additional 6.5% of SNP-heritability. In contrast, little kidney function heritability was explained by kidney endothelial, stromal, or immune cell-specific cCREs. Through functionally informed fine-mapping, we identified putative causal kidney function variants and their corresponding cCREs. Using kidney scATAC-seq data, we created a deep learning model (which we named ChromKid) to predict kidney cell type-specific chromatin accessibility from sequence. ChromKid and allele specific kidney scATAC-seq revealed that many fine-mapped kidney function variants locally change chromatin accessibility in tubule epithelial cells. Enhancer assays confirmed that fine-mapped kidney function variants alter tubule epithelial regulatory element function. To map the genes which these regulatory elements control, we used CRISPR interference (CRISPRi) to target these regulatory elements in tubule epithelial cells and assessed changes in gene expression. CRISPRi of enhancers harboring kidney function variants regulated NDRG1 and RBPMS expression. Thus, inherited differences in tubule epithelial NDRG1 and RBPMS expression may predispose to kidney disease in humans. We conclude that genetic variants affecting tubule epithelial regulatory element function account for most SNP-heritability of human kidney function. This work provides an experimental approach to identify the variants, regulatory elements, and genes involved in polygenic disease.

3.
bioRxiv ; 2024 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-38895200

RESUMO

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.

4.
bioRxiv ; 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-37904945

RESUMO

Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.

5.
Nat Genet ; 55(12): 2056-2059, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38036790

RESUMO

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due to cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect of cis-regulatory genetic variation on expression.


Assuntos
Aprendizado Profundo , Transcriptoma , Humanos , Transcriptoma/genética , Variação Genética/genética , Genoma , Genômica
6.
Genome Biol ; 24(1): 182, 2023 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-37550700

RESUMO

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.


Assuntos
Aprendizado de Máquina , Proteoma , Humanos , Proteoma/genética , Sequência de Aminoácidos , Mutação , Mutação de Sentido Incorreto , Biologia Computacional/métodos
7.
bioRxiv ; 2023 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-37090514

RESUMO

Fine-mapping methods, which aim to identify genetic variants responsible for complex traits following genetic association studies, typically assume that sufficient adjustments for confounding within the association study cohort have been made, e.g., through regressing out the top principal components (i.e., residualization). Despite its widespread use, however, residualization may not completely remove all sources of confounding. Here, we propose a complementary stability-guided approach that does not rely on residualization, which identifies consistently fine-mapped variants across different genetic backgrounds or environments. We demonstrate the utility of this approach by applying it to fine-map eQTLs in the GEUVADIS data. Using 378 different functional annotations of the human genome, including recent deep learning-based annotations (e.g., Enformer), we compare enrichments of these annotations among variants for which the stability and traditional residualization-based fine-mapping approaches agree against those for which they disagree, and find that the stability approach enhances the power of traditional fine-mapping methods in identifying variants with functional impact. Finally, in cases where the two approaches report distinct variants, our approach identifies variants comparably enriched for functional annotations. Our findings suggest that the stability principle, as a conceptually simple device, complements existing approaches to fine-mapping, reinforcing recent advocacy of evaluating cross-population and cross-environment portability of biological findings. To support visualization and interpretation of our results, we provide a Shiny app, available at: https://alan-aw.shinyapps.io/stability_v0/.

8.
bioRxiv ; 2023 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-36909524

RESUMO

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

9.
bioRxiv ; 2023 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-38187742

RESUMO

Genomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates. We characterize prediction consistency on four types of sequences: reference genome sequences, reference genome sequences perturbed with TF motifs, eQTLs, and personal genome sequences. We observe that models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants. For eQTLs and personal genome sequences, we find that model replicates make inconsistent predictions in >50% of cases. Our findings suggest strategies to improve performance of these models.

10.
Nat Commun ; 13(1): 5803, 2022 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-36192477

RESUMO

Age is the primary risk factor for many common human diseases. Here, we quantify the relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that the predictive power of expression quantitative trait loci is impacted by age in many tissues. Jointly modelling the contributions of age and genetics to transcript level variation we find expression heritability (h2) is consistent among tissues while the contribution of aging varies by >20-fold with [Formula: see text] in 5 tissues. We find that while the force of purifying selection is stronger on genes expressed early versus late in life (Medawar's hypothesis), several highly proliferative tissues exhibit the opposite pattern. These non-Medawarian tissues exhibit high rates of cancer and age-of-expression-associated somatic mutations. In contrast, genes under genetic control are under relaxed constraint. Together, we demonstrate the distinct roles of aging and genetics on expression phenotypes.


Assuntos
Envelhecimento , Locos de Características Quantitativas , Envelhecimento/genética , Expressão Gênica , Regulação da Expressão Gênica , Humanos , Fenótipo , Locos de Características Quantitativas/genética
11.
Eur Urol ; 79(3): 353-361, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-32800727

RESUMO

BACKGROUND: Family history of prostate cancer (PCa) is a well-known risk factor, and both common and rare genetic variants are associated with the disease. OBJECTIVE: To detect new genetic variants associated with PCa, capitalizing on the role of family history and more aggressive PCa. DESIGN, SETTING, AND PARTICIPANTS: A two-stage design was used. In stage one, whole-exome sequencing was used to identify potential risk alleles among affected men with a strong family history of disease or with more aggressive disease (491 cases and 429 controls). Aggressive disease was based on a sum of scores for Gleason score, node status, metastasis, tumor stage, prostate-specific antigen at diagnosis, systemic recurrence, and time to PCa death. Genes identified in stage one were screened in stage two using a custom-capture design in an independent set of 2917 cases and 1899 controls. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Frequencies of genetic variants (singly or jointly in a gene) were compared between cases and controls. RESULTS AND LIMITATIONS: Eleven genes previously reported to be associated with PCa were detected (ATM, BRCA2, HOXB13, FAM111A, EMSY, HNF1B, KLK3, MSMB, PCAT1, PRSS3, and TERT), as well as an additional 10 novel genes (PABPC1, QK1, FAM114A1, MUC6, MYCBP2, RAPGEF4, RNASEH2B, ULK4, XPO7, and THAP3). Of these 10 novel genes, all but PABPC1 and ULK4 were primarily associated with the risk of aggressive PCa. CONCLUSIONS: Our approach demonstrates the advantage of gene sequencing in the search for genetic variants associated with PCa and the benefits of sampling patients with a strong family history of disease or an aggressive form of disease. PATIENT SUMMARY: Multiple genes are associated with prostate cancer (PCa) among men with a strong family history of this disease or among men with an aggressive form of PCa.


Assuntos
Neoplasias da Próstata , Genes BRCA2 , Fatores de Troca do Nucleotídeo Guanina , Humanos , Masculino , Gradação de Tumores , Neoplasias da Próstata/genética , Proteínas Serina-Treonina Quinases , Tripsina , Sequenciamento do Exoma
12.
Bioinformatics ; 36(16): 4440-4448, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32330225

RESUMO

SUMMARY: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY AND IMPLEMENTATION: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Mapeamento Cromossômico , Código das Histonas , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética
13.
Genet Med ; 21(11): 2512-2520, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31105274

RESUMO

PURPOSE: Limb-girdle muscular dystrophies (LGMD) are a genetically heterogeneous category of autosomal inherited muscle diseases. Many genes causing LGMD have been identified, and clinical trials are beginning for treatment of some genetic subtypes. However, even with the gene-level mechanisms known, it is still difficult to get a robust and generalizable prevalence estimation for each subtype due to the limited amount of epidemiology data and the low incidence of LGMDs. METHODS: Taking advantage of recently published exome and genome sequencing data from the general population, we used a Bayesian method to develop a robust disease prevalence estimator. RESULTS: This method was applied to nine recessive LGMD subtypes. The estimated disease prevalence calculated by this method was largely comparable with published estimates from epidemiological studies; however, it highlighted instances of possible underdiagnosis for LGMD2B and 2L. CONCLUSION: The increasing size of aggregated population variant databases will allow for robust and reproducible prevalence estimates of recessive disease, which is critical for the strategic design and prioritization of clinical trials.


Assuntos
Distrofia Muscular do Cíngulo dos Membros/epidemiologia , Distrofia Muscular do Cíngulo dos Membros/genética , Teorema de Bayes , Mapeamento Cromossômico , Bases de Dados Genéticas , Exoma , Feminino , Humanos , Masculino , Mutação , Prevalência
14.
J Invest Dermatol ; 138(12): 2589-2594, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30472995

RESUMO

Cutaneous squamous cell cancers (cSCCs) present an under-recognized health issue among non-Hispanic whites, one that is likely to increase as populations age. cSCC risks vary considerably among non-Hispanic whites, and this heterogeneity indicates the need for risk-stratified screening strategies that are guided by patients' personal characteristics and clinical histories. Here we describe cSCCscore, a prediction tool that uses patients' covariates and clinical histories to assign them personal probabilities of developing cSCCs within 3 years after risk assessment. cSCCscore uses a statistical model for the occurrence and timing of a patient's cSCCs, whose parameters we estimated using cohort data from 66,995 patients in the Kaiser Permanente Northern California healthcare system. We found that patients' covariates and histories explained approximately 75% of their interpersonal cSCC risk variation. Using cross-validated performance measures, we also found cSCCscore's predictions to be moderately well calibrated to the patients' observed cSCC incidence. Moreover, cSCCscore discriminated well between patients who subsequently did and did not develop a new primary cSCC within 3 years after risk assignment, with area under the receiver operating characteristic curve of approximately 85%. Thus, cSCCscore can facilitate more informed management of non-Hispanic white patients at cSCC risk. cSCCscore's predictions are available at https://researchapps.github.io/cSCCscore/.


Assuntos
Carcinoma de Células Escamosas/diagnóstico , Detecção Precoce de Câncer/métodos , Modelos Estatísticos , Neoplasias Cutâneas/diagnóstico , População Branca , Idoso , California/epidemiologia , Carcinoma de Células Escamosas/epidemiologia , Estudos de Coortes , Atenção à Saúde , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia , Prognóstico , Projetos de Pesquisa , Fatores de Risco , Neoplasias Cutâneas/epidemiologia
15.
Nat Commun ; 9(1): 4264, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30323283

RESUMO

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate < 10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.


Assuntos
Carcinoma de Células Escamosas/genética , Regulação Neoplásica da Expressão Gênica , Loci Gênicos , Predisposição Genética para Doença , Neoplasias Cutâneas/genética , Bases de Dados Genéticas , Humanos , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes
16.
Cancer Immunol Immunother ; 67(7): 1123-1133, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29754218

RESUMO

BACKGROUND: The immune system has been implicated in the pathophysiology of cutaneous squamous cell carcinoma (cSCC) as evidenced by the substantially increased risk of cSCC in immunosuppressed individuals. Associations between cSCC risk and single nucleotide polymorphisms (SNPs) in the HLA region have been identified by genome-wide association studies (GWAS). The translation of the associated HLA SNPs to structural amino acids changes in HLA molecules has not been previously elucidated. METHODS: Using data from a GWAS that included 7238 cSCC cases and 56,961 controls of non-Hispanic white ancestry, we imputed classical alleles and corresponding amino acid changes in HLA genes. Logistic regression models were used to examine associations between cSCC risk and genotyped or imputed SNPs, classical HLA alleles, and amino acid changes. RESULTS: Among the genotyped SNPs, cSCC risk was associated with rs28535317 (OR = 1.20, p = 9.88 × 10- 11) corresponding to an amino-acid change from phenylalanine to leucine at codon 26 of HLA-DRB1 (OR = 1.17, p = 2.48 × 10- 10). An additional independent association was observed for a threonine to isoleucine change at codon 107 of HLA-DQA1 (OR = 1.14, p = 2.34 × 10- 9). Among the classical HLA alleles, cSCC was associated with DRB1*01 (OR = 1.18, p = 5.86 × 10- 10). Conditional analyses revealed additional independent cSCC associations with DQA1*05:01 and DQA1*05:05. Extended haplotype analysis was used to complement the imputed haplotypes, which identified three extended haplotypes in the HLA-DR and HLA-DQ regions. CONCLUSIONS: Associations with specific HLA-DR and -DQ alleles are likely to explain previously observed GWAS signals in the HLA region associated with cSCC risk.


Assuntos
Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Genes MHC da Classe II , Polimorfismo de Nucleotídeo Único , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/patologia , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fatores de Risco
17.
Bioinformatics ; 33(24): 3895-3901, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28961785

RESUMO

MOTIVATION: Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies. RESULTS: We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types. AVAILABILITY AND IMPLEMENTATION: FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. CONTACT: nilah@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Regulação da Expressão Gênica , Variação Genética , Software , Genômica , Humanos , Locos de Características Quantitativas
18.
Hum Immunol ; 78(4): 327-335, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28185865

RESUMO

Cutaneous squamous cell carcinoma (cSCC) is the second most common cancer among Caucasians in the United States, with rising incidence over the past decade. Treatment for non-melanoma skin cancer, including cSCC, in the United States was estimated to cost $4.8 billion in 2014. Thus, an understanding of cSCC pathogenesis could have important public health implications. Immune function impacts cSCC risk, given that cSCC incidence rates are substantially higher in patients with compromised immune systems. We report a systematic review of published associations between cSCC risk and the human leukocyte antigen (HLA) system. This review includes studies that analyze germline class I and class II HLA allelic variation as well as HLA cell-surface protein expression levels associated with cSCC risk. We propose biological mechanisms for these HLA-cSCC associations based on known mechanisms of HLA involvement in other diseases. The review suggests that immunity regulates the development of cSCC and that HLA-cSCC associations differ between immunocompetent and immunosuppressed patients. This difference may reflect the presence of viral co-factors that affect tumorigenesis in immunosuppressed patients. Finally, we highlight limitations in the literature on HLA-cSCC associations, and suggest directions for future research aimed at understanding, preventing and treating cSCC.


Assuntos
Carcinoma de Células Escamosas/epidemiologia , Infecções por Vírus de DNA/epidemiologia , Antígenos HLA/genética , Papillomaviridae/fisiologia , Neoplasias Cutâneas/epidemiologia , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/imunologia , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Imunidade , Terapia de Imunossupressão , Polimorfismo Genético , Fatores de Risco , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/imunologia , Estados Unidos , População Branca
19.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-27666373

RESUMO

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Assuntos
Doença/genética , Mutação de Sentido Incorreto/genética , Software , Área Sob a Curva , Análise Mutacional de DNA , Exoma/genética , Frequência do Gene , Humanos , Curva ROC
20.
J Invest Dermatol ; 136(5): 930-937, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-26829030

RESUMO

We report a genome-wide association study of cutaneous squamous cell carcinoma conducted among non-Hispanic white members of the Kaiser Permanente Northern California health care system. The study includes a genome-wide screen of 61,457 members (6,891 cases and 54,566 controls) genotyped on the Affymetrix Axiom European array and a replication phase involving an independent set of 6,410 additional members (810 cases and 5,600 controls). Combined analysis of screening and replication phases identified 10 loci containing single-nucleotide polymorphisms (SNPs) with P-values < 5 × 10(-8). Six loci contain genes in the pigmentation pathway; SNPs at these loci appear to modulate squamous cell carcinoma risk independently of the pigmentation phenotypes. Another locus contains HLA class II genes studied in relation to elevated squamous cell carcinoma risk following immunosuppression. SNPs at the remaining three loci include an intronic SNP in FOXP1 at locus 3p13, an intergenic SNP at 3q28 near TP63, and an intergenic SNP at 9p22 near BNC2. These findings provide insights into the genetic factors accounting for inherited squamous cell carcinoma susceptibility.


Assuntos
Carcinoma de Células Escamosas/genética , Loci Gênicos , Predisposição Genética para Doença/epidemiologia , Estudo de Associação Genômica Ampla , Neoplasias Cutâneas/genética , Adulto , California , Carcinoma de Células Escamosas/epidemiologia , Carcinoma de Células Escamosas/patologia , Estudos de Casos e Controles , Estudos de Coortes , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Neoplasias Cutâneas/epidemiologia , Neoplasias Cutâneas/patologia , População Branca/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...