Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
bioRxiv ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-37904945

ABSTRACT

Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.

2.
Nat Genet ; 55(12): 2056-2059, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38036790

ABSTRACT

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due to cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect of cis-regulatory genetic variation on expression.


Subject(s)
Deep Learning , Transcriptome , Humans , Transcriptome/genetics , Genetic Variation/genetics , Genome , Genomics
3.
Genome Biol ; 24(1): 182, 2023 08 07.
Article in English | MEDLINE | ID: mdl-37550700

ABSTRACT

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.


Subject(s)
Machine Learning , Proteome , Humans , Proteome/genetics , Amino Acid Sequence , Mutation , Mutation, Missense , Computational Biology/methods
4.
bioRxiv ; 2023 Apr 13.
Article in English | MEDLINE | ID: mdl-37090514

ABSTRACT

Fine-mapping methods, which aim to identify genetic variants responsible for complex traits following genetic association studies, typically assume that sufficient adjustments for confounding within the association study cohort have been made, e.g., through regressing out the top principal components (i.e., residualization). Despite its widespread use, however, residualization may not completely remove all sources of confounding. Here, we propose a complementary stability-guided approach that does not rely on residualization, which identifies consistently fine-mapped variants across different genetic backgrounds or environments. We demonstrate the utility of this approach by applying it to fine-map eQTLs in the GEUVADIS data. Using 378 different functional annotations of the human genome, including recent deep learning-based annotations (e.g., Enformer), we compare enrichments of these annotations among variants for which the stability and traditional residualization-based fine-mapping approaches agree against those for which they disagree, and find that the stability approach enhances the power of traditional fine-mapping methods in identifying variants with functional impact. Finally, in cases where the two approaches report distinct variants, our approach identifies variants comparably enriched for functional annotations. Our findings suggest that the stability principle, as a conceptually simple device, complements existing approaches to fine-mapping, reinforcing recent advocacy of evaluating cross-population and cross-environment portability of biological findings. To support visualization and interpretation of our results, we provide a Shiny app, available at: https://alan-aw.shinyapps.io/stability_v0/.

5.
bioRxiv ; 2023 Feb 27.
Article in English | MEDLINE | ID: mdl-36909524

ABSTRACT

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

6.
bioRxiv ; 2023 Dec 23.
Article in English | MEDLINE | ID: mdl-38187742

ABSTRACT

Genomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates. We characterize prediction consistency on four types of sequences: reference genome sequences, reference genome sequences perturbed with TF motifs, eQTLs, and personal genome sequences. We observe that models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants. For eQTLs and personal genome sequences, we find that model replicates make inconsistent predictions in >50% of cases. Our findings suggest strategies to improve performance of these models.

7.
Nat Commun ; 13(1): 5803, 2022 10 03.
Article in English | MEDLINE | ID: mdl-36192477

ABSTRACT

Age is the primary risk factor for many common human diseases. Here, we quantify the relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that the predictive power of expression quantitative trait loci is impacted by age in many tissues. Jointly modelling the contributions of age and genetics to transcript level variation we find expression heritability (h2) is consistent among tissues while the contribution of aging varies by >20-fold with [Formula: see text] in 5 tissues. We find that while the force of purifying selection is stronger on genes expressed early versus late in life (Medawar's hypothesis), several highly proliferative tissues exhibit the opposite pattern. These non-Medawarian tissues exhibit high rates of cancer and age-of-expression-associated somatic mutations. In contrast, genes under genetic control are under relaxed constraint. Together, we demonstrate the distinct roles of aging and genetics on expression phenotypes.


Subject(s)
Aging , Quantitative Trait Loci , Aging/genetics , Gene Expression , Gene Expression Regulation , Humans , Phenotype , Quantitative Trait Loci/genetics
8.
Eur Urol ; 79(3): 353-361, 2021 03.
Article in English | MEDLINE | ID: mdl-32800727

ABSTRACT

BACKGROUND: Family history of prostate cancer (PCa) is a well-known risk factor, and both common and rare genetic variants are associated with the disease. OBJECTIVE: To detect new genetic variants associated with PCa, capitalizing on the role of family history and more aggressive PCa. DESIGN, SETTING, AND PARTICIPANTS: A two-stage design was used. In stage one, whole-exome sequencing was used to identify potential risk alleles among affected men with a strong family history of disease or with more aggressive disease (491 cases and 429 controls). Aggressive disease was based on a sum of scores for Gleason score, node status, metastasis, tumor stage, prostate-specific antigen at diagnosis, systemic recurrence, and time to PCa death. Genes identified in stage one were screened in stage two using a custom-capture design in an independent set of 2917 cases and 1899 controls. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Frequencies of genetic variants (singly or jointly in a gene) were compared between cases and controls. RESULTS AND LIMITATIONS: Eleven genes previously reported to be associated with PCa were detected (ATM, BRCA2, HOXB13, FAM111A, EMSY, HNF1B, KLK3, MSMB, PCAT1, PRSS3, and TERT), as well as an additional 10 novel genes (PABPC1, QK1, FAM114A1, MUC6, MYCBP2, RAPGEF4, RNASEH2B, ULK4, XPO7, and THAP3). Of these 10 novel genes, all but PABPC1 and ULK4 were primarily associated with the risk of aggressive PCa. CONCLUSIONS: Our approach demonstrates the advantage of gene sequencing in the search for genetic variants associated with PCa and the benefits of sampling patients with a strong family history of disease or an aggressive form of disease. PATIENT SUMMARY: Multiple genes are associated with prostate cancer (PCa) among men with a strong family history of this disease or among men with an aggressive form of PCa.


Subject(s)
Prostatic Neoplasms , Genes, BRCA2 , Guanine Nucleotide Exchange Factors , Humans , Male , Neoplasm Grading , Prostatic Neoplasms/genetics , Protein Serine-Threonine Kinases , Trypsin , Exome Sequencing
9.
Bioinformatics ; 36(16): 4440-4448, 2020 08 15.
Article in English | MEDLINE | ID: mdl-32330225

ABSTRACT

SUMMARY: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY AND IMPLEMENTATION: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Chromosome Mapping , Histone Code , Humans , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics
10.
Genet Med ; 21(11): 2512-2520, 2019 11.
Article in English | MEDLINE | ID: mdl-31105274

ABSTRACT

PURPOSE: Limb-girdle muscular dystrophies (LGMD) are a genetically heterogeneous category of autosomal inherited muscle diseases. Many genes causing LGMD have been identified, and clinical trials are beginning for treatment of some genetic subtypes. However, even with the gene-level mechanisms known, it is still difficult to get a robust and generalizable prevalence estimation for each subtype due to the limited amount of epidemiology data and the low incidence of LGMDs. METHODS: Taking advantage of recently published exome and genome sequencing data from the general population, we used a Bayesian method to develop a robust disease prevalence estimator. RESULTS: This method was applied to nine recessive LGMD subtypes. The estimated disease prevalence calculated by this method was largely comparable with published estimates from epidemiological studies; however, it highlighted instances of possible underdiagnosis for LGMD2B and 2L. CONCLUSION: The increasing size of aggregated population variant databases will allow for robust and reproducible prevalence estimates of recessive disease, which is critical for the strategic design and prioritization of clinical trials.


Subject(s)
Muscular Dystrophies, Limb-Girdle/epidemiology , Muscular Dystrophies, Limb-Girdle/genetics , Bayes Theorem , Chromosome Mapping , Databases, Genetic , Exome , Female , Humans , Male , Mutation , Prevalence
11.
J Invest Dermatol ; 138(12): 2589-2594, 2018 12.
Article in English | MEDLINE | ID: mdl-30472995

ABSTRACT

Cutaneous squamous cell cancers (cSCCs) present an under-recognized health issue among non-Hispanic whites, one that is likely to increase as populations age. cSCC risks vary considerably among non-Hispanic whites, and this heterogeneity indicates the need for risk-stratified screening strategies that are guided by patients' personal characteristics and clinical histories. Here we describe cSCCscore, a prediction tool that uses patients' covariates and clinical histories to assign them personal probabilities of developing cSCCs within 3 years after risk assessment. cSCCscore uses a statistical model for the occurrence and timing of a patient's cSCCs, whose parameters we estimated using cohort data from 66,995 patients in the Kaiser Permanente Northern California healthcare system. We found that patients' covariates and histories explained approximately 75% of their interpersonal cSCC risk variation. Using cross-validated performance measures, we also found cSCCscore's predictions to be moderately well calibrated to the patients' observed cSCC incidence. Moreover, cSCCscore discriminated well between patients who subsequently did and did not develop a new primary cSCC within 3 years after risk assignment, with area under the receiver operating characteristic curve of approximately 85%. Thus, cSCCscore can facilitate more informed management of non-Hispanic white patients at cSCC risk. cSCCscore's predictions are available at https://researchapps.github.io/cSCCscore/.


Subject(s)
Carcinoma, Squamous Cell/diagnosis , Early Detection of Cancer/methods , Models, Statistical , Skin Neoplasms/diagnosis , White People , Aged , California/epidemiology , Carcinoma, Squamous Cell/epidemiology , Cohort Studies , Delivery of Health Care , Female , Humans , Incidence , Male , Middle Aged , Neoplasm Recurrence, Local , Prognosis , Research Design , Risk Factors , Skin Neoplasms/epidemiology
12.
Nat Commun ; 9(1): 4264, 2018 10 15.
Article in English | MEDLINE | ID: mdl-30323283

ABSTRACT

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate < 10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.


Subject(s)
Carcinoma, Squamous Cell/genetics , Gene Expression Regulation, Neoplastic , Genetic Loci , Genetic Predisposition to Disease , Skin Neoplasms/genetics , Databases, Genetic , Humans , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results
13.
Cancer Immunol Immunother ; 67(7): 1123-1133, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29754218

ABSTRACT

BACKGROUND: The immune system has been implicated in the pathophysiology of cutaneous squamous cell carcinoma (cSCC) as evidenced by the substantially increased risk of cSCC in immunosuppressed individuals. Associations between cSCC risk and single nucleotide polymorphisms (SNPs) in the HLA region have been identified by genome-wide association studies (GWAS). The translation of the associated HLA SNPs to structural amino acids changes in HLA molecules has not been previously elucidated. METHODS: Using data from a GWAS that included 7238 cSCC cases and 56,961 controls of non-Hispanic white ancestry, we imputed classical alleles and corresponding amino acid changes in HLA genes. Logistic regression models were used to examine associations between cSCC risk and genotyped or imputed SNPs, classical HLA alleles, and amino acid changes. RESULTS: Among the genotyped SNPs, cSCC risk was associated with rs28535317 (OR = 1.20, p = 9.88 × 10- 11) corresponding to an amino-acid change from phenylalanine to leucine at codon 26 of HLA-DRB1 (OR = 1.17, p = 2.48 × 10- 10). An additional independent association was observed for a threonine to isoleucine change at codon 107 of HLA-DQA1 (OR = 1.14, p = 2.34 × 10- 9). Among the classical HLA alleles, cSCC was associated with DRB1*01 (OR = 1.18, p = 5.86 × 10- 10). Conditional analyses revealed additional independent cSCC associations with DQA1*05:01 and DQA1*05:05. Extended haplotype analysis was used to complement the imputed haplotypes, which identified three extended haplotypes in the HLA-DR and HLA-DQ regions. CONCLUSIONS: Associations with specific HLA-DR and -DQ alleles are likely to explain previously observed GWAS signals in the HLA region associated with cSCC risk.


Subject(s)
Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/pathology , Genes, MHC Class II , Polymorphism, Single Nucleotide , Skin Neoplasms/genetics , Skin Neoplasms/pathology , Genome-Wide Association Study , Genotype , Humans , Risk Factors
14.
Bioinformatics ; 33(24): 3895-3901, 2017 Dec 15.
Article in English | MEDLINE | ID: mdl-28961785

ABSTRACT

MOTIVATION: Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies. RESULTS: We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types. AVAILABILITY AND IMPLEMENTATION: FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. CONTACT: nilah@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Regulation , Genetic Variation , Software , Genomics , Humans , Quantitative Trait Loci
15.
Hum Immunol ; 78(4): 327-335, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28185865

ABSTRACT

Cutaneous squamous cell carcinoma (cSCC) is the second most common cancer among Caucasians in the United States, with rising incidence over the past decade. Treatment for non-melanoma skin cancer, including cSCC, in the United States was estimated to cost $4.8 billion in 2014. Thus, an understanding of cSCC pathogenesis could have important public health implications. Immune function impacts cSCC risk, given that cSCC incidence rates are substantially higher in patients with compromised immune systems. We report a systematic review of published associations between cSCC risk and the human leukocyte antigen (HLA) system. This review includes studies that analyze germline class I and class II HLA allelic variation as well as HLA cell-surface protein expression levels associated with cSCC risk. We propose biological mechanisms for these HLA-cSCC associations based on known mechanisms of HLA involvement in other diseases. The review suggests that immunity regulates the development of cSCC and that HLA-cSCC associations differ between immunocompetent and immunosuppressed patients. This difference may reflect the presence of viral co-factors that affect tumorigenesis in immunosuppressed patients. Finally, we highlight limitations in the literature on HLA-cSCC associations, and suggest directions for future research aimed at understanding, preventing and treating cSCC.


Subject(s)
Carcinoma, Squamous Cell/epidemiology , DNA Virus Infections/epidemiology , HLA Antigens/genetics , Papillomaviridae/physiology , Skin Neoplasms/epidemiology , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/immunology , Gene Frequency , Genome-Wide Association Study , Humans , Immunity , Immunosuppression Therapy , Polymorphism, Genetic , Risk Factors , Skin Neoplasms/genetics , Skin Neoplasms/immunology , United States , White People
16.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Article in English | MEDLINE | ID: mdl-27666373

ABSTRACT

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Subject(s)
Disease/genetics , Mutation, Missense/genetics , Software , Area Under Curve , DNA Mutational Analysis , Exome/genetics , Gene Frequency , Humans , ROC Curve
17.
J Invest Dermatol ; 136(5): 930-937, 2016 05.
Article in English | MEDLINE | ID: mdl-26829030

ABSTRACT

We report a genome-wide association study of cutaneous squamous cell carcinoma conducted among non-Hispanic white members of the Kaiser Permanente Northern California health care system. The study includes a genome-wide screen of 61,457 members (6,891 cases and 54,566 controls) genotyped on the Affymetrix Axiom European array and a replication phase involving an independent set of 6,410 additional members (810 cases and 5,600 controls). Combined analysis of screening and replication phases identified 10 loci containing single-nucleotide polymorphisms (SNPs) with P-values < 5 × 10(-8). Six loci contain genes in the pigmentation pathway; SNPs at these loci appear to modulate squamous cell carcinoma risk independently of the pigmentation phenotypes. Another locus contains HLA class II genes studied in relation to elevated squamous cell carcinoma risk following immunosuppression. SNPs at the remaining three loci include an intronic SNP in FOXP1 at locus 3p13, an intergenic SNP at 3q28 near TP63, and an intergenic SNP at 9p22 near BNC2. These findings provide insights into the genetic factors accounting for inherited squamous cell carcinoma susceptibility.


Subject(s)
Carcinoma, Squamous Cell/genetics , Genetic Loci , Genetic Predisposition to Disease/epidemiology , Genome-Wide Association Study , Skin Neoplasms/genetics , Adult , California , Carcinoma, Squamous Cell/epidemiology , Carcinoma, Squamous Cell/pathology , Case-Control Studies , Cohort Studies , Female , Genotype , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , Skin Neoplasms/epidemiology , Skin Neoplasms/pathology , White People/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...