Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 155
Filter
1.
Sci Transl Med ; 16(745): eade4510, 2024 May.
Article in English | MEDLINE | ID: mdl-38691621

ABSTRACT

Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.


Subject(s)
Common Variable Immunodeficiency , Electronic Health Records , Humans , Common Variable Immunodeficiency/diagnosis , Machine Learning , Algorithms , Male , Female , Phenotype , Adult , Undiagnosed Diseases/diagnosis
2.
Sci Adv ; 10(21): eadn7655, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781333

ABSTRACT

Few neuropsychiatric disorders have replicable biomarkers, prompting high-resolution and large-scale molecular studies. However, we still lack consensus on a more foundational question: whether quantitative shifts in cell types-the functional unit of life-contribute to neuropsychiatric disorders. Leveraging advances in human brain single-cell methylomics, we deconvolve seven major cell types using bulk DNA methylation profiling across 1270 postmortem brains, including from individuals diagnosed with Alzheimer's disease, schizophrenia, and autism. We observe and replicate cell-type compositional shifts for Alzheimer's disease (endothelial cell loss), autism (increased microglia), and schizophrenia (decreased oligodendrocytes), and find age- and sex-related changes. Multiple layers of evidence indicate that endothelial cell loss contributes to Alzheimer's disease, with comparable effect size to APOE genotype among older people. Genome-wide association identified five genetic loci related to cell-type composition, involving plausible genes for the neurovascular unit (P2RX5 and TRPV3) and excitatory neurons (DPY30 and MEMO1). These results implicate specific cell-type shifts in the pathophysiology of neuropsychiatric disorders.


Subject(s)
Alzheimer Disease , Autistic Disorder , Brain , DNA Methylation , Schizophrenia , Humans , Alzheimer Disease/genetics , Alzheimer Disease/pathology , Alzheimer Disease/metabolism , Schizophrenia/genetics , Schizophrenia/pathology , Brain/metabolism , Brain/pathology , Autistic Disorder/genetics , Autistic Disorder/pathology , Male , Female , Genome-Wide Association Study , Aged , Endothelial Cells/metabolism , Endothelial Cells/pathology , Epigenomics/methods , Middle Aged , Aged, 80 and over
3.
HGG Adv ; 5(3): 100302, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38704641

ABSTRACT

Polygenic scores (PGSs) summarize the combined effect of common risk variants and are associated with breast cancer risk in patients without identifiable monogenic risk factors. One of the most well-validated PGSs in breast cancer to date is PGS313, which was developed from a Northern European biobank but has shown attenuated performance in non-European ancestries. We further investigate the generalizability of the PGS313 for American women of European (EA), African (AFR), Asian (EAA), and Latinx (HL) ancestry within one institution with a singular electronic health record (EHR) system, genotyping platform, and quality control process. We found that the PGS313 achieved overlapping areas under the receiver operator characteristic (ROC) curve (AUCs) in females of HL (AUC = 0.68, 95% confidence interval [CI] = 0.65-0.71) and EA ancestry (AUC = 0.70, 95% CI = 0.69-0.71) but lower AUCs for the AFR and EAA populations (AFR: AUC = 0.61, 95% CI = 0.56-0.65; EAA: AUC = 0.64, 95% CI = 0.60-0.680). While PGS313 is associated with hormone-receptor-positive (HR+) disease in EA Americans (odds ratio [OR] = 1.42, 95% CI = 1.16-1.64), this association is lost in African, Latinx, and Asian Americans. In summary, we found that PGS313 was significantly associated with breast cancer but with attenuated accuracy in women of AFR and EAA descent within a singular health system in Los Angeles. Our work further highlights the need for additional validation in diverse cohorts prior to the clinical implementation of PGSs.

4.
Science ; 384(6698): eadh7688, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781356

ABSTRACT

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.6% were novel (not previously annotated in Gencode version 33), and uncovered a substantial contribution of transcript-isoform diversity-regulated by RNA binding proteins-in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to reprioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders.


Subject(s)
Mental Disorders , Neocortex , Neurogenesis , Protein Isoforms , RNA Splicing , Single-Cell Analysis , Transcriptome , Humans , Alternative Splicing , Genetic Predisposition to Disease , Mental Disorders/genetics , Molecular Sequence Annotation , Neocortex/metabolism , Neocortex/embryology , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Neurogenesis/genetics
5.
Science ; 384(6698): eadh0829, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781368

ABSTRACT

Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.


Subject(s)
Alternative Splicing , Brain , Gene Expression Regulation, Developmental , Mental Disorders , Humans , Atlases as Topic , Autism Spectrum Disorder/genetics , Brain/metabolism , Brain/growth & development , Brain/embryology , Gene Regulatory Networks , Genome-Wide Association Study , Protein Isoforms/genetics , Protein Isoforms/metabolism , Quantitative Trait Loci , Schizophrenia/genetics , Transcriptome , Mental Disorders/genetics
6.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38490256

ABSTRACT

SUMMARY: Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION: Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.


Subject(s)
Software , Genotype , Phenotype
7.
Am J Hum Genet ; 111(2): 323-337, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38306997

ABSTRACT

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.


Subject(s)
Genome-Wide Association Study , Lithium , Humans , Genome-Wide Association Study/methods , RNA-Seq , Quantitative Trait Loci/genetics , Phenotype , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease
8.
medRxiv ; 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38370649

ABSTRACT

BACKGROUND: Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS: We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS: Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOE and the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS: Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.

9.
medRxiv ; 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38370677

ABSTRACT

Background: Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer's disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods: We defined phenotypes using phecodes mapped from diagnosis codes, with patients' records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results: The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions: We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.

10.
Res Sq ; 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38410460

ABSTRACT

BACKGROUND: Genetic risk modeling for dementia offers significant benefits, but studies based on real-world data, particularly for underrepresented populations, are limited. METHODS: We employed an Elastic Net model for dementia risk prediction using single-nucleotide polymorphisms prioritized by functional genomic data from multiple neurodegenerative disease genome-wide association studies. We compared this model with APOE and polygenic risk score models across genetic ancestry groups, using electronic health records from UCLA Health for discovery and All of Us cohort for validation. RESULTS: Our model significantly outperforms other models across multiple ancestries, improving the area-under-precision-recall curve by 21-61% and the area-under-the-receiver-operating characteristic by 10-21% compared to the APOEand the polygenic risk score models. We identified shared and ancestry-specific risk genes and biological pathways, reinforcing and adding to existing knowledge. CONCLUSIONS: Our study highlights benefits of integrating functional mapping, multiple neurodegenerative diseases, and machine learning for genetic risk models in diverse populations. Our findings hold potential for refining precision medicine strategies in dementia diagnosis.

11.
medRxiv ; 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38260294

ABSTRACT

Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium GWAS meta-analyses of European- (71,771 cases and 1,059,740 controls) and African-ancestry samples (7,482 cases and 129,975 controls). We used LDpred2 and PRSCSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6,261 cases and 88,238 controls) and African-ancestry sample (1,385 cases and 12,569 controls). Multi-ancestry PRSs with weights tuned in European- and African-ancestry samples, respectively, outperformed ancestry-specific PRSs in European- (PRSCSXEUR: AUC=0.61 (0.60, 0.61), PRSCSX_combinedEUR: AUC=0.61 (0.60, 0.62)) and African-ancestry test samples (PRSCSXAFR: AUC=0.58 (0.57, 0.6), PRSCSX_combined AFR: AUC=0.59 (0.57, 0.60)). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS may be used to identify individuals at highest risk for VTE and provide guidance for the most effective treatment strategy across diverse populations.

12.
Transl Psychiatry ; 14(1): 38, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38238290

ABSTRACT

Tobacco use is a major risk factor for many diseases and is heavily influenced by environmental factors with significant underlying genetic contributions. Here, we evaluated the predictive performance, risk stratification, and potential systemic health effects of tobacco use disorder (TUD) predisposing germline variants using a European- ancestry-derived polygenic score (PGS) in 24,202 participants from the multi-ancestry, hospital-based UCLA ATLAS biobank. Among genetically inferred ancestry groups (GIAs), TUD-PGS was significantly associated with TUD in European American (EA) (OR: 1.20, CI: [1.16, 1.24]), Hispanic/Latin American (HL) (OR:1.19, CI: [1.11, 1.28]), and East Asian American (EAA) (OR: 1.18, CI: [1.06, 1.31]) GIAs but not in African American (AA) GIA (OR: 1.04, CI: [0.93, 1.17]). Similarly, TUD-PGS offered strong risk stratification across PGS quantiles in EA and HL GIAs and inconsistently in EAA and AA GIAs. In a cross-ancestry phenome-wide association meta-analysis, TUD-PGS was associated with cardiometabolic, respiratory, and psychiatric phecodes (17 phecodes at P < 2.7E-05). In individuals with no history of smoking, the top TUD-PGS associations with obesity and alcohol-related disorders (P = 3.54E-07, 1.61E-06) persist. Mendelian Randomization (MR) analysis provides evidence of a causal association between adiposity measures and tobacco use. Inconsistent predictive performance of the TUD-PGS across GIAs motivates the inclusion of multiple ancestry populations at all levels of genetic research of tobacco use for equitable clinical translation of TUD-PGS. Phenome associations suggest that TUD-predisposed individuals may require comprehensive tobacco use prevention and management approaches to address underlying addictive tendencies.


Subject(s)
Biological Specimen Banks , Tobacco Use Disorder , Humans , Los Angeles , Tobacco Use , Tobacco Use Disorder/genetics , Risk Factors , Obesity , Genome-Wide Association Study
13.
Nat Rev Genet ; 25(1): 8-25, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37620596

ABSTRACT

Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.


Subject(s)
Genetic Predisposition to Disease , Humans , Risk Factors , Multifactorial Inheritance , Precision Medicine , Genome-Wide Association Study
14.
Pac Symp Biocomput ; 29: 322-326, 2024.
Article in English | MEDLINE | ID: mdl-38160289

ABSTRACT

The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.


Subject(s)
Computational Biology , Precision Medicine , Humans , Machine Learning , Health Inequities
15.
Res Sq ; 2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37961486

ABSTRACT

Background: Bilirubin is a potent antioxidant with a protective role in many diseases. We examined the relationships between serum bilirubin (SB) levels, tobacco smoking (a known cause of low SB), and aerodigestive cancers, grouped as lung cancers (LC) and head and neck cancers (HNC). Methods: We examined the associations between SB, LC, and HNC using data from 393,210 participants from a real-world, diverse, de-identified data repository and biobank linked to the UCLA Health system. We employed regression models, propensity score matching, and polygenic scores to investigate the associations and interactions between SB, tobacco smoking, LC, and HNC. Results: Current tobacco smokers showed lower SB (-0.04mg/dL, 95% CI: [-0.04, -0.03]), compared to never-smokers. Lower SB levels were observed in HNC and LC cases (-0.10 mg/dL, [-0.13, -0.09] and - 0.09 mg/dL, CI [-0.1, -0.07] respectively) compared to cancer-free controls with the effect persisting after adjusting for smoking. SB levels were inversely associated with HNC and LC risk (ORs per SD change in SB: 0.64, CI [0.59,0.69] and 0.57, CI [0.43,0.75], respectively). Lastly, a polygenic score (PGS) for SB was associated with LC (OR per SD change of SB-PGS: 0.71, CI [0.67, 0.76]). Conclusions: Low SB levels are associated with an increased risk of both HNC and LC, independent of the effect of tobacco smoking. Additionally, tobacco smoking demonstrated a strong interaction with SB on LC risk. Lastly, genetically predicted low SB (using a polygenic score) is negatively associated with LC. These findings suggest that SB could serve as a potential early and low-cost biomarker for LC and HNC. The interaction with tobacco smoking suggests that smokers with lower bilirubin could likely be at higher risk for LC compared to never smokers, suggesting the utility of SB in risk stratification for patients at risk for LC. Lastly, the results of the polygenic score analyses suggest potential shared biological pathways between the genetic control of SB and the risk of LC development.

16.
Am J Hum Genet ; 110(12): 2042-2055, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-37944514

ABSTRACT

LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Bayes Theorem , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics
17.
Nat Genet ; 55(12): 2117-2128, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38036788

ABSTRACT

Methods integrating genetics with transcriptomic reference panels prioritize risk genes and mechanisms at only a fraction of trait-associated genetic loci, due in part to an overreliance on total gene expression as a molecular outcome measure. This challenge is particularly relevant for the brain, in which extensive splicing generates multiple distinct transcript-isoforms per gene. Due to complex correlation structures, isoform-level modeling from cis-window variants requires methodological innovation. Here we introduce isoTWAS, a multivariate, stepwise framework integrating genetics, isoform-level expression and phenotypic associations. Compared to gene-level methods, isoTWAS improves both isoform and gene expression prediction, yielding more testable genes, and increased power for discovery of trait associations within genome-wide association study loci across 15 neuropsychiatric traits. We illustrate multiple isoTWAS associations undetectable at the gene-level, prioritizing isoforms of AKT3, CUL3 and HSPD1 in schizophrenia and PCLO with multiple disorders. Results highlight the importance of incorporating isoform-level resolution within integrative approaches to increase discovery of trait associations, especially for brain-relevant traits.


Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Genome-Wide Association Study/methods , Quantitative Trait Loci/genetics , Genetic Predisposition to Disease , Brain/metabolism , Protein Isoforms/metabolism , Polymorphism, Single Nucleotide
18.
bioRxiv ; 2023 Oct 02.
Article in English | MEDLINE | ID: mdl-37873338

ABSTRACT

Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic study of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations.

19.
medRxiv ; 2023 Oct 03.
Article in English | MEDLINE | ID: mdl-37873378

ABSTRACT

Background: Bilirubin is a potent antioxidant with a protective role in many diseases. We examined the relationships between serum bilirubin (SB) levels, tobacco smoking (a known cause of low SB), and aerodigestive cancers, grouped as lung (LC) and head and neck (HNC). Methods: We examined the associations between SB, LC and HNC using data from 393,210 participants from UCLA Health, employing regression models, propensity score matching, and polygenic scores. Results: Current tobacco smokers showed lower SB (-0.04mg/dL, 95% CI: [-0.04, -0.03]), compared to never-smokers. Lower SB levels were observed in HNC and LC cases (-0.10 mg/dL, [-0.13, -0.09] and -0.09 mg/dL, CI [-0.1, -0.07] respectively) compared to cancer-free controls with the effect persisting after adjusting for smoking. SB levels were inversely associated with HNC and LC risk (ORs per SD change in SB: 0.64, CI [0.59,0.69] and 0.57, CI [0.43,0.75], respectively). Lastly, a polygenic score (PGS) for SB was associated with LC (OR per SD change of SB-PGS: 0.71, CI [0.67, 0.76]). Conclusions: Low SB levels are associated with an increased risk of both HNC and LC, independent of the effect of tobacco smoking with tobacco smoking demonstrating a strong interaction with SB on LC risk. Additionally, genetically predicted low SB (from polygenic scores) is negatively associated with LC. Impact: These findings suggest that SB could serve as a potential early biomarker for LC and HNC.

20.
Stat Med ; 42(26): 4867-4885, 2023 Nov 20.
Article in English | MEDLINE | ID: mdl-37643728

ABSTRACT

Polygenicity refers to the phenomenon that multiple genetic variants have a nonzero effect on a complex trait. It is defined as the proportion of genetic variants with a nonzero effect on the trait. Evaluation of polygenicity can provide valuable insights into the genetic architecture of the trait. Several recent works have attempted to estimate polygenicity at the single nucleotide polymorphism level. However, evaluating polygenicity at the gene level can be biologically more meaningful. We propose the notion of gene-level polygenicity, defined as the proportion of genes having a nonzero effect on the trait under the framework of a transcriptome-wide association study. We introduce a Bayesian approach genepoly to estimate this quantity for a trait. The method is based on spike and slab prior and simultaneously estimates the subset of non-null genes. Our simulation study shows that genepoly efficiently estimates gene-level polygenicity. The method produces a downward bias for small choices of trait heritability due to a non-null gene, which diminishes rapidly with an increase in the genome-wide association study (GWAS) sample size. While identifying the subset of non-null genes, genepoly offers a high level of specificity and an overall good level of sensitivity-the sensitivity increases as the sample size of the reference panel expression and GWAS data increase. We applied the method to seven phenotypes in the UK Biobank, integrating expression data. We find height to be the most polygenic and asthma to be the least polygenic.

SELECTION OF CITATIONS
SEARCH DETAIL
...