Search | VHL Regional Portal

1.

Efficient and accurate mixed model association tool for single-cell eQTL analysis.

Zhou, Wei; Cuomo, Anna S E; Xue, Angli; Kanai, Masahiro; Chau, Grant; Krishna, Chirag; Xavier, Ramnik J; MacArthur, Daniel G; Powell, Joseph E; Daly, Mark J; Neale, Benjamin M.

medRxiv ; 2024 May 16.

Article in English | MEDLINE | ID: mdl-38798318

ABSTRACT

Understanding the genetic basis of gene expression can help us understand the molecular underpinnings of human traits and disease. Expression quantitative trait locus (eQTL) mapping can help in studying this relationship but have been shown to be very cell-type specific, motivating the use of single-cell RNA sequencing and single-cell eQTLs to obtain a more granular view of genetic regulation. Current methods for single-cell eQTL mapping either rely on the "pseudobulk" approach and traditional pipelines for bulk transcriptomics or do not scale well to large datasets. Here, we propose SAIGE-QTL, a robust and scalable tool that can directly map eQTLs using single-cell profiles without needing aggregation at the pseudobulk level. Additionally, SAIGE-QTL allows for testing the effects of less frequent/rare genetic variation through set-based tests, which is traditionally excluded from eQTL mapping studies. We evaluate the performance of SAIGE-QTL on both real and simulated data and demonstrate the improved power for eQTL mapping over existing pipelines.

2.

Rare genetic variation in VE-PTP is associated with central serous chorioretinopathy, venous dysfunction and glaucoma.

Rämö, Joel T; Gorman, Bryan; Weng, Lu-Chen; Jurgens, Sean J; Singhanetr, Panisa; Tieger, Marisa G; van Dijk, Elon Hc; Halladay, Christopher W; Wang, Xin; Brinks, Joost; Choi, Seung Hoan; Luo, Yuyang; Pyarajan, Saiju; Nealon, Cari L; Gorin, Michael B; Wu, Wen-Chih; Sobrin, Lucia; Kaarniranta, Kai; Yzer, Suzanne; Palotie, Aarno; Peachey, Neal S; Turunen, Joni A; Boon, Camiel Jf; Ellinor, Patrick T; Iyengar, Sudha K; Daly, Mark J; Rossin, Elizabeth J.

medRxiv ; 2024 May 09.

Article in English | MEDLINE | ID: mdl-38766240

ABSTRACT

Central serous chorioretinopathy (CSC) is a fluid maculopathy whose etiology is not well understood. Abnormal choroidal veins in CSC patients have been shown to have similarities with varicose veins. To identify potential mechanisms, we analyzed genotype data from 1,477 CSC patients and 455,449 controls in FinnGen. We identified an association for a low-frequency (AF=0.5%) missense variant (rs113791087) in the gene encoding vascular endothelial protein tyrosine phosphatase (VE-PTP) (OR=2.85, P=4.5×10-9). This was confirmed in a meta-analysis of 2,452 CSC patients and 865,767 controls from 4 studies (OR=3.06, P=7.4×10-15). Rs113791087 was associated with a 56% higher prevalence of retinal abnormalities (35.3% vs 22.6%, P=8.0×10-4) in 708 UK Biobank participants and, surprisingly, with varicose veins (OR=1.31, P=2.3×10-11) and glaucoma (OR=0.82, P=6.9×10-9). Predicted loss-of-function variants in VEPTP, though rare in number, were associated with CSC in All of Us (OR=17.10, P=0.018). These findings highlight the significance of VE-PTP in diverse ocular and systemic vascular diseases.

3.

A harmonized public resource of deeply sequenced diverse human genomes.

Koenig, Zan; Yohannes, Mary T; Nkambule, Lethukuthula L; Zhao, Xuefang; Goodrich, Julia K; Kim, Heesu Ally; Wilson, Michael W; Tiao, Grace; Hao, Stephanie P; Sahakian, Nareh; Chao, Katherine R; Walker, Mark A; Lyu, Yunfei; Rehm, Heidi; Neale, Benjamin M; Talkowski, Michael E; Daly, Mark J; Brand, Harrison; Karczewski, Konrad J; Atkinson, Elizabeth G; Martin, Alicia R.

Genome Res ; 2024 May 15.

Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

4.

Oral and non-oral lichen planus show genetic heterogeneity and differential risk for autoimmune disease and oral cancer.

Reeve, Mary Pat; Vehviläinen, Mari; Luo, Shuang; Ritari, Jarmo; Karjalainen, Juha; Gracia-Tabuenca, Javier; Mehtonen, Juha; Padmanabhuni, Shanmukha Sampath; Kolosov, Nikita; Artomov, Mykyta; Siirtola, Harri; Olilla, Hanna M; Graham, Daniel; Partanen, Jukka; Xavier, Ramnik J; Daly, Mark J; Ripatti, Samuli; Salo, Tuula; Siponen, Maria.

Am J Hum Genet ; 111(6): 1047-1060, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38776927

ABSTRACT

Lichen planus (LP) is a T-cell-mediated inflammatory disease affecting squamous epithelia in many parts of the body, most often the skin and oral mucosa. Cutaneous LP is usually transient and oral LP (OLP) is most often chronic, so we performed a large-scale genetic and epidemiological study of LP to address whether the oral and non-oral subgroups have shared or distinct underlying pathologies and their overlap with autoimmune disease. Using lifelong records covering diagnoses, procedures, and clinic identity from 473,580 individuals in the FinnGen study, genome-wide association analyses were conducted on carefully constructed subcategories of OLP (n = 3,323) and non-oral LP (n = 4,356) and on the combined group. We identified 15 genome-wide significant associations in FinnGen and an additional 12 when meta-analyzed with UKBB (27 independent associations at 25 distinct genomic locations), most of which are shared between oral and non-oral LP. Many associations coincide with known autoimmune disease loci, consistent with the epidemiologic enrichment of LP with hypothyroidism and other autoimmune diseases. Notably, a third of the FinnGen associations demonstrate significant differences between OLP and non-OLP. We also observed a 13.6-fold risk for tongue cancer and an elevated risk for other oral cancers in OLP, in agreement with earlier reports that connect LP with higher cancer incidence. In addition to a large-scale dissection of LP genetics and comorbidities, our study demonstrates the use of comprehensive, multidimensional health registry data to address outstanding clinical questions and reveal underlying biological mechanisms in common but understudied diseases.

Subject(s)

Autoimmune Diseases , Genome-Wide Association Study , Lichen Planus, Oral , Mouth Neoplasms , Humans , Autoimmune Diseases/genetics , Lichen Planus, Oral/genetics , Lichen Planus, Oral/pathology , Mouth Neoplasms/genetics , Mouth Neoplasms/pathology , Female , Male , Genetic Heterogeneity , Middle Aged , Lichen Planus/genetics , Lichen Planus/pathology , Genetic Predisposition to Disease , Aged , Adult , Risk Factors , Polymorphism, Single Nucleotide

5.

The landscape of regional missense mutational intolerance quantified from 125,748 exomes.

Chao, Katherine R; Wang, Lily; Panchal, Ruchit; Liao, Calwing; Abderrazzaq, Haneen; Ye, Robert; Schultz, Patrick; Compitello, John; Grant, Riley H; Kosmicki, Jack A; Weisburd, Ben; Phu, William; Wilson, Michael W; Laricchia, Kristen M; Goodrich, Julia K; Goldstein, Daniel; Goldstein, Jacqueline I; Vittal, Christopher; Poterba, Timothy; Baxter, Samantha; Watts, Nicholas A; Solomonson, Matthew; Tiao, Grace; Rehm, Heidi L; Neale, Benjamin M; Talkowski, Michael E; MacArthur, Daniel G; O'Donnell-Luria, Anne; Karczewski, Konrad J; Radivojac, Predrag; Daly, Mark J; Samocha, Kaitlin E.

bioRxiv ; 2024 May 03.

Article in English | MEDLINE | ID: mdl-38645134

ABSTRACT

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

6.

Rare variants in genes coding for components of the terminal pathway of the complement system in preeclampsia.

Lokki, A; Triebwasser, Michael; Daly, Emma; Cohort, Finnpec; Kurki, Mrtja; Perola, Markus; Auro, Kirsi; Salmon, Jane; Java, Anuja; Daly, Mark; Atkinson, John; Laivuori, Hannele; Meri, Seppo.

Res Sq ; 2024 Apr 02.

Article in English | MEDLINE | ID: mdl-38645143

ABSTRACT

Preeclampsia is a common multifactorial disease of pregnancy. Dysregulation of the complement activation is among emerging candidates responsible for disease pathogenesis. In a targeted exomic sequencing study we identified 14 variants within nine genes coding for components of the membrane attack complex (MAC, C5b-9) that are associated with preeclampsia. We found two rare missense variants in the C5 gene that predispose to preeclampsia (rs200674959: I1296V, OR (CI95) = 24.13 (1.25-467.43), p-value = 0.01 and rs147430470: I330T, OR (CI95) = 22.75 (1.17-440.78), p-value = 0.01). In addition, one predisposing rare variant and one protective rare variant were discovered in C6 (rs41271067: D396G, OR (CI95) = 2.93 (1.18-7.10), p-value = 0.01 and rs114609505: T190I, 0.02 OR (CI95) = 0.47 (0.22-0.92), p-value = 0.02). The results suggest that variants in terminal complement pathway predispose to preeclampsia.

7.

Role of IL-27 in Epstein-Barr virus infection revealed by IL-27RA deficiency.

Martin, Emmanuel; Winter, Sarah; Garcin, Cécile; Tanita, Kay; Hoshino, Akihiro; Lenoir, Christelle; Fournier, Benjamin; Migaud, Mélanie; Boutboul, David; Simonin, Mathieu; Fernandes, Alicia; Bastard, Paul; Le Voyer, Tom; Roupie, Anne-Laure; Ben Ahmed, Yassine; Leruez-Ville, Marianne; Burgard, Marianne; Rao, Geetha; Ma, Cindy S; Masson, Cécile; Soudais, Claire; Picard, Capucine; Bustamante, Jacinta; Tangye, Stuart G; Cheikh, Nathalie; Seppänen, Mikko; Puel, Anne; Daly, Mark; Casanova, Jean-Laurent; Neven, Bénédicte; Fischer, Alain; Latour, Sylvain.

Nature ; 628(8008): 620-629, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38509369

ABSTRACT

Epstein-Barr virus (EBV) infection can engender severe B cell lymphoproliferative diseases1,2. The primary infection is often asymptomatic or causes infectious mononucleosis (IM), a self-limiting lymphoproliferative disorder3. Selective vulnerability to EBV has been reported in association with inherited mutations impairing T cell immunity to EBV4. Here we report biallelic loss-of-function variants in IL27RA that underlie an acute and severe primary EBV infection with a nevertheless favourable outcome requiring a minimal treatment. One mutant allele (rs201107107) was enriched in the Finnish population (minor allele frequency = 0.0068) and carried a high risk of severe infectious mononucleosis when homozygous. IL27RA encodes the IL-27 receptor alpha subunit5,6. In the absence of IL-27RA, phosphorylation of STAT1 and STAT3 by IL-27 is abolished in T cells. In in vitro studies, IL-27 exerts a synergistic effect on T-cell-receptor-dependent T cell proliferation7 that is deficient in cells from the patients, leading to impaired expansion of potent anti-EBV effector cytotoxic CD8+ T cells. IL-27 is produced by EBV-infected B lymphocytes and an IL-27RA-IL-27 autocrine loop is required for the maintenance of EBV-transformed B cells. This potentially explains the eventual favourable outcome of the EBV-induced viral disease in patients with IL-27RA deficiency. Furthermore, we identified neutralizing anti-IL-27 autoantibodies in most individuals who developed sporadic infectious mononucleosis and chronic EBV infection. These results demonstrate the critical role of IL-27RA-IL-27 in immunity to EBV, but also the hijacking of this defence by EBV to promote the expansion of infected transformed B cells.

Subject(s)

Epstein-Barr Virus Infections , Interleukin-27 , Receptors, Interleukin , Adolescent , Adult , Child , Child, Preschool , Female , Humans , Infant , Male , Young Adult , Alleles , B-Lymphocytes/pathology , B-Lymphocytes/virology , CD8-Positive T-Lymphocytes/pathology , Epstein-Barr Virus Infections/complications , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/therapy , Finland , Gene Frequency , Herpesvirus 4, Human , Homozygote , Infectious Mononucleosis/complications , Infectious Mononucleosis/genetics , Infectious Mononucleosis/therapy , Interleukin-27/immunology , Interleukin-27/metabolism , Loss of Function Mutation , Receptors, Interleukin/deficiency , Receptors, Interleukin/genetics , Receptors, Interleukin/metabolism , Treatment Outcome

8.

Evidence for the additivity of rare and common variant burden throughout the spectrum of intellectual disability.

Urpa, Lea; Kurki, Mitja I; Rahikkala, Elisa; Hämäläinen, Eija; Salomaa, Veikko; Suvisaari, Jaana; Keski-Filppula, Riikka; Rauhala, Merja; Korpi-Heikkilä, Satu; Komulainen-Ebrahim, Jonna; Helander, Heli; Vieira, Päivi; Uusimaa, Johanna; Moilanen, Jukka S; Körkkö, Jarmo; Singh, Tarjinder; Kuismin, Outi; Pietiläinen, Olli; Palotie, Aarno; Daly, Mark J.

Eur J Hum Genet ; 32(5): 576-583, 2024 May.

Article in English | MEDLINE | ID: mdl-38467730

ABSTRACT

Intellectual disability (ID) is a common disorder, yet there is a wide spectrum of impairment from mild to profoundly affected individuals. Mild ID is seen as the low extreme of the general distribution of intelligence, while severe ID is often seen as a monogenic disorder caused by rare, pathogenic, highly penetrant variants. To investigate the genetic factors influencing mild and severe ID, we evaluated rare and common variation in the Northern Finland Intellectual Disability cohort (n = 1096 ID patients), a cohort with a high percentage of mild ID (n = 550) and from a population bottleneck enriched in rare, damaging variation. Despite this enrichment, we found only a small percentage of ID was due to recessive Finnish-enriched variants (0.5%). A larger proportion was linked to dominant variation, with a significant burden of rare, damaging variation in both mild and severe ID. This rare variant burden was enriched in more severe ID (p = 2.4e-4), patients without a relative with ID (p = 4.76e-4), and in those with features associated with monogenic disorders. We also found a significant burden of common variants associated with decreased cognitive function, with no difference between mild and more severe ID. When we included common and rare variants in a joint model, the rare and common variants had additive effects in both mild and severe ID. A multimodel inference approach also found that common and rare variants together best explained ID status (ΔAIC = 16.8, ΔBIC = 10.2). Overall, we report evidence for the additivity of rare and common variant burden throughout the spectrum of intellectual disability.

Subject(s)

Intellectual Disability , Humans , Intellectual Disability/genetics , Intellectual Disability/pathology , Male , Female , Finland , Adult , Genetic Variation

9.

Thrombosis risk in single- and double-heterozygous carriers of factor V Leiden and prothrombin G20210A in FinnGen and the UK Biobank.

Ryu, Justine; Rämö, Joel T; Jurgens, Sean J; Niiranen, Teemu; Sanna-Cherchi, Simone; Bauer, Kenneth A; Haj, Amelia; Choi, Seung Hoan; Palotie, Aarno; Daly, Mark; Ellinor, Patrick T; Bendapudi, Pavan K.

Blood ; 143(23): 2425-2432, 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38498041

ABSTRACT

ABSTRACT: The factor V Leiden (FVL; rs6025) and prothrombin G20210A (PTGM; rs1799963) polymorphisms are 2 of the most well-studied genetic risk factors for venous thromboembolism (VTE). However, double heterozygosity (DH) for FVL and PTGM remains poorly understood, with previous studies showing marked disagreement regarding thrombosis risk conferred by the DH genotype. Using multidimensional data from the UK Biobank (UKB) and FinnGen biorepositories, we evaluated the clinical impact of DH carrier status across 937 939 individuals. We found that 662 participants (0.07%) were DH carriers. After adjustment for age, sex, and ancestry, DH individuals experienced a markedly elevated risk of VTE compared with wild-type individuals (odds ratio [OR] = 5.24; 95% confidence interval [CI], 4.01-6.84; P = 4.8 × 10-34), which approximated the risk conferred by FVL homozygosity. A secondary analysis restricted to UKB participants (N = 445 144) found that effect size estimates for the DH genotype remained largely unchanged (OR = 4.53; 95% CI, 3.42-5.90; P < 1 × 10-16) after adjustment for commonly cited VTE risk factors, such as body mass index, blood type, and markers of inflammation. In contrast, the DH genotype was not associated with a significantly higher risk of any arterial thrombosis phenotype, including stroke, myocardial infarction, and peripheral artery disease. In summary, we leveraged population-scale genomic data sets to conduct, to our knowledge, the largest study to date on the DH genotype and were able to establish far more precise effect size estimates than previously possible. Our findings indicate that the DH genotype may occur as frequently as FVL homozygosity and may confer a similarly increased risk of VTE.

Subject(s)

Biological Specimen Banks , Factor V , Heterozygote , Prothrombin , Humans , Prothrombin/genetics , Factor V/genetics , Female , Male , Middle Aged , United Kingdom/epidemiology , Aged , Risk Factors , Venous Thromboembolism/genetics , Venous Thromboembolism/epidemiology , Adult , Thrombosis/genetics , Thrombosis/epidemiology , Thrombosis/etiology , Genetic Predisposition to Disease , Genotype , Polymorphism, Single Nucleotide , UK Biobank

10.

Genetic contribution to disease-course severity and progression in the SUPER-Finland study, a cohort of 10,403 individuals with psychotic disorders.

Kämpe, Anders; Suvisaari, Jaana; Lähteenvuo, Markku; Singh, Tarjinder; Ahola-Olli, Ari; Urpa, Lea; Haaki, Willehard; Hietala, Jarmo; Isometsä, Erkki; Jukuri, Tuomas; Kampman, Olli; Kieseppä, Tuula; Lahdensuo, Kaisla; Lönnqvist, Jouko; Männynsalo, Teemu; Paunio, Tiina; Niemi-Pynttäri, Jussi; Suokas, Kimmo; Tuulio-Henriksson, Annamari; Veijola, Juha; Wegelius, Asko; Daly, Mark; Taylor, Jacob; Kendler, Kenneth S; Palotie, Aarno; Pietiläinen, Olli.

Mol Psychiatry ; 2024 Apr 01.

Article in English | MEDLINE | ID: mdl-38556557

ABSTRACT

Genetic factors contribute to the susceptibility of psychotic disorders, but less is known how they affect psychotic disease-course development. Utilizing polygenic scores (PGSs) in combination with longitudinal healthcare data with decades of follow-up we investigated the contributing genetics to psychotic disease-course severity and diagnostic shifts in the SUPER-Finland study, encompassing 10 403 genotyped individuals with a psychotic disorder. To longitudinally track the study participants' past disease-course severity, we created a psychiatric hospitalization burden metric using the full-coverage and nation-wide Finnish in-hospital registry (data from 1969 and onwards). Using a hierarchical model, ranking the psychotic diagnoses according to clinical severity, we show that high schizophrenia PGS (SZ-PGS) was associated with progression from lower ranked psychotic disorders to schizophrenia (OR = 1.32 [1.23-1.43], p = 1.26e-12). This development manifested already at psychotic illness onset as a higher psychiatric hospitalization burden, the proxy for disease-course severity. In schizophrenia (n = 5 479), both a high SZ-PGS and a low educational attainment PGS (EA-PGS) were associated with increased psychiatric hospitalization burden (p = 1.00e-04 and p = 4.53e-10). The SZ-PGS and the EA-PGS associated with distinct patterns of hospital usage. In individuals with high SZ-PGS, the increased hospitalization burden was composed of longer individual hospital stays, while low EA-PGS associated with shorter but more frequent hospital visits. The negative effect of a low EA-PGS was found to be partly mediated via substance use disorder, a major risk factor for hospitalizations. In conclusion, we show that high SZ-PGS and low EA-PGS both impacted psychotic disease-course development negatively but resulted in different disease-course trajectories.

11.

SCGB1D2 inhibits growth of Borrelia burgdorferi and affects susceptibility to Lyme disease.

Strausz, Satu; Abner, Erik; Blacker, Grace; Galloway, Sarah; Hansen, Paige; Feng, Qingying; Lee, Brandon T; Jones, Samuel E; Haapaniemi, Hele; Raak, Sten; Nahass, George Ronald; Sanders, Erin; Soodla, Pilleriin; Võsa, Urmo; Esko, Tõnu; Sinnott-Armstrong, Nasa; Weissman, Irving L; Daly, Mark; Aivelo, Tuomas; Tal, Michal Caspi; Ollila, Hanna M.

Nat Commun ; 15(1): 2041, 2024 Mar 19.

Article in English | MEDLINE | ID: mdl-38503741

ABSTRACT

Lyme disease is a tick-borne disease caused by bacteria of the genus Borrelia. The host factors that modulate susceptibility for Lyme disease have remained mostly unknown. Using epidemiological and genetic data from FinnGen and Estonian Biobank, we identify two previously known variants and an unknown common missense variant at the gene encoding for Secretoglobin family 1D member 2 (SCGB1D2) protein that increases the susceptibility for Lyme disease. Using live Borrelia burgdorferi (Bb) we find that recombinant reference SCGB1D2 protein inhibits the growth of Bb in vitro more efficiently than the recombinant protein with SCGB1D2 P53L deleterious missense variant. Finally, using an in vivo murine infection model we show that recombinant SCGB1D2 prevents infection by Borrelia in vivo. Together, these data suggest that SCGB1D2 is a host defense factor present in the skin, sweat, and other secretions which protects against Bb infection and opens an exciting therapeutic avenue for Lyme disease.

Subject(s)

Borrelia burgdorferi , Ixodes , Lyme Disease , Mice , Animals , Humans , Borrelia burgdorferi/genetics , Lyme Disease/microbiology , Ixodes/microbiology , Secretoglobins

12.

Public platform with 39,472 exome control samples enables association studies without genotype sharing.

Artomov, Mykyta; Loboda, Alexander A; Artyomov, Maxim N; Daly, Mark J.

Nat Genet ; 56(2): 327-335, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38200129

ABSTRACT

Acquiring a sufficiently powered cohort of control samples matched to a case sample can be time-consuming or, in some cases, impossible. Accordingly, an ability to leverage genetic data from control samples that were already collected elsewhere could dramatically improve power in genetic association studies. Sharing of control samples can pose significant challenges, since most human genetic data are subject to strict sharing regulations. Here, using the properties of singular value decomposition and subsampling algorithm, we developed a method allowing selection of the best-matching controls in an external pool of samples compliant with personal data protection and eliminating the need for genotype sharing. We provide access to a library of 39,472 exome sequencing controls at http://dnascore.net enabling association studies for case cohorts lacking control subjects. Using this approach, control sets can be selected from this online library with a prespecified matching accuracy, ensuring well-calibrated association analysis for both rare and common variants.

Subject(s)

Algorithms , Exome , Humans , Exome/genetics , Genotype , Genetic Association Studies , Research

13.

Distinct and shared genetic architectures of gestational diabetes mellitus and type 2 diabetes.

Elliott, Amanda; Walters, Raymond K; Pirinen, Matti; Kurki, Mitja; Junna, Nella; Goldstein, Jacqueline I; Reeve, Mary Pat; Siirtola, Harri; Lemmelä, Susanna M; Turley, Patrick; Lahtela, Elisa; Mehtonen, Juha; Reis, Kadri; Elnahas, Abdelrahman G; Reigo, Anu; Palta, Priit; Esko, Tõnu; Mägi, Reedik; Palotie, Aarno; Daly, Mark J; Widén, Elisabeth.

Nat Genet ; 56(3): 377-382, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38182742

ABSTRACT

Gestational diabetes mellitus (GDM) is a common metabolic disorder affecting more than 16 million pregnancies annually worldwide1,2. GDM is related to an increased lifetime risk of type 2 diabetes (T2D)1-3, with over a third of women developing T2D within 15 years of their GDM diagnosis. The diseases are hypothesized to share a genetic predisposition1-7, but few studies have sought to uncover the genetic underpinnings of GDM. Most studies have evaluated the impact of T2D loci only8-10, and the three prior genome-wide association studies of GDM11-13 have identified only five loci, limiting the power to assess to what extent variants or biological pathways are specific to GDM. We conducted the largest genome-wide association study of GDM to date in 12,332 cases and 131,109 parous female controls in the FinnGen study and identified 13 GDM-associated loci, including nine new loci. Genetic features distinct from T2D were identified both at the locus and genomic scale. Our results suggest that the genetics of GDM risk falls into the following two distinct categories: one part conventional T2D polygenic risk and one part predominantly influencing mechanisms disrupted in pregnancy. Loci with GDM-predominant effects map to genes related to islet cells, central glucose homeostasis, steroidogenesis and placental expression.

Subject(s)

Diabetes Mellitus, Type 2 , Diabetes, Gestational , Islets of Langerhans , Pregnancy , Female , Humans , Diabetes Mellitus, Type 2/genetics , Diabetes, Gestational/genetics , Genome-Wide Association Study , Placenta

14.

Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 626(7997): E1, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38225470

15.

A harmonized public resource of deeply sequenced diverse human genomes.

Koenig, Zan; Yohannes, Mary T; Nkambule, Lethukuthula L; Zhao, Xuefang; Goodrich, Julia K; Kim, Heesu Ally; Wilson, Michael W; Tiao, Grace; Hao, Stephanie P; Sahakian, Nareh; Chao, Katherine R; Walker, Mark A; Lyu, Yunfei; Rehm, Heidi L; Neale, Benjamin M; Talkowski, Michael E; Daly, Mark J; Brand, Harrison; Karczewski, Konrad J; Atkinson, Elizabeth G; Martin, Alicia R.

bioRxiv ; 2024 Feb 28.

Article in English | MEDLINE | ID: mdl-36747613

ABSTRACT

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

16.

Inferring compound heterozygosity from large-scale exome sequencing data.

Guo, Michael H; Francioli, Laurent C; Stenton, Sarah L; Goodrich, Julia K; Watts, Nicholas A; Singer-Berk, Moriel; Groopman, Emily; Darnowsky, Philip W; Solomonson, Matthew; Baxter, Samantha; Tiao, Grace; Neale, Benjamin M; Hirschhorn, Joel N; Rehm, Heidi L; Daly, Mark J; O'Donnell-Luria, Anne; Karczewski, Konrad J; MacArthur, Daniel G; Samocha, Kaitlin E.

Nat Genet ; 56(1): 152-161, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38057443

ABSTRACT

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.

Subject(s)

Exome , High-Throughput Nucleotide Sequencing , Humans , Exome/genetics , Exome Sequencing , Genotype

17.

A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 625(7993): 92-100, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Subject(s)

Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic

18.

Improving fine-mapping by modeling infinitesimal effects.

Cui, Ran; Elzur, Roy A; Kanai, Masahiro; Ulirsch, Jacob C; Weissbrod, Omer; Daly, Mark J; Neale, Benjamin M; Fan, Zhou; Finucane, Hilary K.

Nat Genet ; 56(1): 162-169, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38036779

ABSTRACT

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.

Subject(s)

Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Bayes Theorem , Multifactorial Inheritance , Algorithms

19.

CR1 variants contribute to FSGS susceptibility across multiple populations.

Skitchenko, Rostislav; Modrusan, Zora; Loboda, Alexander; Kopp, Jeffrey B; Winkler, Cheryl A; Sergushichev, Alexey; Gupta, Namrata; Stevens, Christine; Daly, Mark J; Shaw, Andrey; Artomov, Mykyta.

medRxiv ; 2023 Nov 20.

Article in English | MEDLINE | ID: mdl-38076851

ABSTRACT

Focal segmental glomerulosclerosis (FSGS) is a common cause of nephrotic syndrome with an annual incidence in the United States in African-Americans compared to European-Americans of 24 cases and 5 cases per million, respectively. Among glomerular diseases in Europe and Latin-America, FSGS was the second most frequent diagnosis, and in Asia the fifth. We expand previous efforts in understanding genetics of FSGS by performing a case-control study involving ethnically-diverse groups FSGS cases (726) and a pool of controls (13,994), using panel sequencing of approximately 2,500 podocyte-expressed genes. Through rare variant association tests, we replicated known risk genes - KANK1, COL4A4, and APOL1. A novel significant association was observed for the gene encoding complement receptor 1 (CR1). High-risk rare variants in CR1 in the European-American cohort were commonly observed in Latin- and African-Americans. Therefore, a combined rare and common variant analysis was used to replicate the CR1 association in non-European populations. The CR1 risk variant, rs17047661, gives rise to the Sl1/Sl2 (R1601G) allele that was previously associated with protection against cerebral malaria. Pleiotropic effects of rs17047661 may explain the difference in allele frequencies across continental ancestries and suggest a possible role for genetically-driven alterations of adaptive immunity in the pathogenesis of FSGS.

20.

Polygenic risk scores as a marker for epilepsy risk across lifetime and after unspecified seizure events.

Heyne, Henrike O; Pajuste, Fanny-Dhelia; Wanner, Julian; Onwuchekwa, Jennifer I Daniel; Mägi, Reedik; Palotie, Aarno; Kälviainen, Reetta; Daly, Mark J.

medRxiv ; 2023 Nov 27.

Article in English | MEDLINE | ID: mdl-38076931

ABSTRACT

A diagnosis of epilepsy has significant consequences for an individual but is often challenging in clinical practice. Novel biomarkers are thus greatly needed. Here, we investigated how common genetic factors (epilepsy polygenic risk scores, [PRSs]) influence epilepsy risk in detailed longitudinal electronic health records (EHRs) of > 360k Finns spanning up to 50 years of individuals' lifetimes. Individuals with a high genetic generalized epilepsy PRS (PRSGGE) in FinnGen had an increased risk for genetic generalized epilepsy (GGE) (hazard ratio [HR] 1.55 per PRSGGE standard deviation [SD]) across their lifetime and after unspecified seizure events. Effect sizes of epilepsy PRSs were comparable to effect sizes in clinically curated data supporting our EHR-derived epilepsy diagnoses. Within 10 years after an unspecified seizure, the GGE rate was 37% when PRSGGE > 2 SD compared to 5.6% when PRSGGE < -2 SD. The effect of PRSGGE was even larger on GGE subtypes of idiopathic generalized epilepsy (IGE) (HR 2.1 per SD PRSGGE). We further report significantly larger effects of PRSGGE on epilepsy in females and in younger age groups. Analogously, we found significant but more modest focal epilepsy PRS burden associated with non-acquired focal epilepsy (NAFE). We found PRSGGE specifically associated with GGE in comparison with >2000 independent diseases while PRSNAFE was also associated with other diseases than NAFE such as back pain. Here, we show that epilepsy specific PRSs have good discriminative ability after a first seizure event i.e. in circumstances where the prior probability of epilepsy is high outlining a potential to serve as biomarkers for an epilepsy diagnosis.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL