Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 65
Filter
1.
Am J Hum Genet ; 111(5): 966-978, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38701746

ABSTRACT

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Subject(s)
Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , Genotype
2.
Elife ; 132024 Feb 09.
Article in English | MEDLINE | ID: mdl-38334359

ABSTRACT

Genetic variants in gene regulatory sequences can modify gene expression and mediate the molecular response to environmental stimuli. In addition, genotype-environment interactions (GxE) contribute to complex traits such as cardiovascular disease. Caffeine is the most widely consumed stimulant and is known to produce a vascular response. To investigate GxE for caffeine, we treated vascular endothelial cells with caffeine and used a massively parallel reporter assay to measure allelic effects on gene regulation for over 43,000 genetic variants. We identified 665 variants with allelic effects on gene regulation and 6 variants that regulate the gene expression response to caffeine (GxE, false discovery rate [FDR] < 5%). When overlapping our GxE results with expression quantitative trait loci colocalized with coronary artery disease and hypertension, we dissected their regulatory mechanisms and showed a modulatory role for caffeine. Our results demonstrate that massively parallel reporter assay is a powerful approach to identify and molecularly characterize GxE in the specific context of caffeine consumption.


Subject(s)
Endothelial Cells , Gene-Environment Interaction , Caffeine/pharmacology , Gene Expression Regulation , Quantitative Trait Loci
3.
medRxiv ; 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37425837

ABSTRACT

Metabolites are small molecules that are useful for estimating disease risk and elucidating disease biology. Nevertheless, their causal effects on human diseases have not been evaluated comprehensively. We performed two-sample Mendelian randomization to systematically infer the causal effects of 1,099 plasma metabolites measured in 6,136 Finnish men from the METSIM study on risk of 2,099 binary disease endpoints measured in 309,154 Finnish individuals from FinnGen. We identified evidence for 282 causal effects of 70 metabolites on 183 disease endpoints (FDR<1%). We found 25 metabolites with potential causal effects across multiple disease domains, including ascorbic acid 2-sulfate affecting 26 disease endpoints in 12 disease domains. Our study suggests that N-acetyl-2-aminooctanoate and glycocholenate sulfate affect risk of atrial fibrillation through two distinct metabolic pathways and that N-methylpipecolate may mediate the causal effect of N6, N6-dimethyllysine on anxious personality disorder. This study highlights the broad causal impact of plasma metabolites and widespread metabolic connections across diseases.

4.
Genome Res ; 33(6): 839-856, 2023 06.
Article in English | MEDLINE | ID: mdl-37442575

ABSTRACT

Synthetic glucocorticoids, such as dexamethasone, have been used as a treatment for many immune conditions, such as asthma and, more recently, severe COVID-19. Single-cell data can capture more fine-grained details on transcriptional variability and dynamics to gain a better understanding of the molecular underpinnings of inter-individual variation in drug response. Here, we used single-cell RNA-seq to study the dynamics of the transcriptional response to glucocorticoids in activated peripheral blood mononuclear cells from 96 African American children. We used novel statistical approaches to calculate a mean-independent measure of gene expression variability and a measure of transcriptional response pseudotime. Using these approaches, we showed that glucocorticoids reverse the effects of immune stimulation on both gene expression mean and variability. Our novel measure of gene expression response dynamics, based on the diagonal linear discriminant analysis, separated individual cells by response status on the basis of their transcriptional profiles and allowed us to identify different dynamic patterns of gene expression along the response pseudotime. We identified genetic variants regulating gene expression mean and variability, including treatment-specific effects, and showed widespread genetic regulation of the transcriptional dynamics of the gene expression response.


Subject(s)
COVID-19 , Glucocorticoids , Child , Humans , Glucocorticoids/pharmacology , Glucocorticoids/metabolism , Leukocytes, Mononuclear/metabolism , COVID-19/genetics , Gene Expression Regulation
5.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37279733

ABSTRACT

MOTIVATION: Replicability is the cornerstone of scientific research. The current statistical method for high-dimensional replicability analysis either cannot control the false discovery rate (FDR) or is too conservative. RESULTS: We propose a statistical method, JUMP, for the high-dimensional replicability analysis of two studies. The input is a high-dimensional paired sequence of p-values from two studies and the test statistic is the maximum of p-values of the pair. JUMP uses four states of the p-value pairs to indicate whether they are null or non-null. Conditional on the hidden states, JUMP computes the cumulative distribution function of the maximum of p-values for each state to conservatively approximate the probability of rejection under the composite null of replicability. JUMP estimates unknown parameters and uses a step-up procedure to control FDR. By incorporating different states of composite null, JUMP achieves a substantial power gain over existing methods while controlling the FDR. Analyzing two pairs of spatially resolved transcriptomic datasets, JUMP makes biological discoveries that otherwise cannot be obtained by using existing methods. AVAILABILITY AND IMPLEMENTATION: An R package JUMP implementing the JUMP method is available on CRAN (https://CRAN.R-project.org/package=JUMP).


Subject(s)
Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods
6.
Nat Commun ; 14(1): 2229, 2023 04 19.
Article in English | MEDLINE | ID: mdl-37076491

ABSTRACT

Expression quantitative trait locus (eQTL) studies illuminate genomic variants that regulate specific genes and contribute to fine-mapped loci discovered via genome-wide association studies (GWAS). Efforts to maximize their accuracy are ongoing. Using 240 glomerular (GLOM) and 311 tubulointerstitial (TUBE) micro-dissected samples from human kidney biopsies, we discovered 5371 GLOM and 9787 TUBE genes with at least one variant significantly associated with expression (eGene) by incorporating kidney single-nucleus open chromatin data and transcription start site distance as an "integrative prior" for Bayesian statistical fine-mapping. The use of an integrative prior resulted in higher resolution eQTLs illustrated by (1) smaller numbers of variants in credible sets with greater confidence, (2) increased enrichment of partitioned heritability for GWAS of two kidney traits, (3) an increased number of variants colocalized with the GWAS loci, and (4) enrichment of computationally predicted functional regulatory variants. A subset of variants and genes were validated experimentally in vitro and using a Drosophila nephrocyte model. More broadly, this study demonstrates that tissue-specific eQTL maps informed by single-nucleus open chromatin data have enhanced utility for diverse downstream analyses.


Subject(s)
Genome-Wide Association Study , Kidney Diseases , Humans , Genome-Wide Association Study/methods , Bayes Theorem , Kidney Diseases/genetics , Genomics , Chromatin/genetics , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease/genetics
7.
Am J Hum Genet ; 110(1): 44-57, 2023 01 05.
Article in English | MEDLINE | ID: mdl-36608684

ABSTRACT

Integrative genetic association methods have shown great promise in post-GWAS (genome-wide association study) analyses, in which one of the most challenging tasks is identifying putative causal genes and uncovering molecular mechanisms of complex traits. Recent studies suggest that prevailing computational approaches, including transcriptome-wide association studies (TWASs) and colocalization analysis, are individually imperfect, but their joint usage can yield robust and powerful inference results. This paper presents INTACT, a computational framework to integrate probabilistic evidence from these distinct types of analyses and implicate putative causal genes. This procedure is flexible and can work with a wide range of existing integrative analysis approaches. It has the unique ability to quantify the uncertainty of implicated genes, enabling rigorous control of false-positive discoveries. Taking advantage of this highly desirable feature, we further propose an efficient algorithm, INTACT-GSE, for gene set enrichment analysis based on the integrated probabilistic evidence. We examine the proposed computational methods and illustrate their improved performance over the existing approaches through simulation studies. We apply the proposed methods to analyze the multi-tissue eQTL data from the GTEx project and eight large-scale complex- and molecular-trait GWAS datasets from multiple consortia and the UK Biobank. Overall, we find that the proposed methods markedly improve the existing putative gene implication methods and are particularly advantageous in evaluating and identifying key gene sets and biological pathways underlying complex traits.


Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Transcriptome/genetics , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Computer Simulation , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease
8.
Nat Commun ; 14(1): 230, 2023 01 16.
Article in English | MEDLINE | ID: mdl-36646693

ABSTRACT

Puberty is an important developmental period marked by hormonal, metabolic and immune changes. Puberty also marks a shift in sex differences in susceptibility to asthma. Yet, little is known about the gene expression changes in immune cells that occur during pubertal development. Here we assess pubertal development and leukocyte gene expression in a longitudinal cohort of 251 children with asthma. We identify substantial gene expression changes associated with age and pubertal development. Gene expression changes between pre- and post-menarcheal females suggest a shift from predominantly innate to adaptive immunity. We show that genetic effects on gene expression change dynamically during pubertal development. Gene expression changes during puberty are correlated with gene expression changes associated with asthma and may explain sex differences in prevalence. Our results show that molecular data used to study the genetics of early onset diseases should consider pubertal development as an important factor that modifies the transcriptome.


Subject(s)
Asthma , Puberty , Humans , Male , Child , Female , Puberty/genetics , Menarche , Asthma/genetics , Asthma/epidemiology , Leukocytes , Age Factors , Longitudinal Studies
9.
Am J Hum Genet ; 109(10): 1727-1741, 2022 10 06.
Article in English | MEDLINE | ID: mdl-36055244

ABSTRACT

Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.


Subject(s)
Genome-Wide Association Study , Transcriptome , Bilirubin , Carnitine , Glycerophospholipids , Humans , Male , Metabolomics , Quantitative Trait Loci/genetics , Solute Carrier Family 22 Member 5/genetics , Transcriptome/genetics
10.
Am J Hum Genet ; 109(5): 825-837, 2022 05 05.
Article in English | MEDLINE | ID: mdl-35523146

ABSTRACT

Transcriptome-wide association studies and colocalization analysis are popular computational approaches for integrating genetic-association data from molecular and complex traits. They show the unique ability to go beyond variant-level genetic-association evidence and implicate critical functional units, e.g., genes, in disease etiology. However, in practice, when the two approaches are applied to the same molecular and complex-trait data, the inference results can be markedly different. This paper systematically investigates the inferential reproducibility between the two approaches through theoretical derivation, numerical experiments, and analyses of four complex trait GWAS and GTEx eQTL data. We identify two classes of inconsistent inference results. We find that the first class of inconsistent results (i.e., genes with strong colocalization but weak transcriptome-wide association study [TWAS] signals) might suggest an interesting biological phenomenon, i.e., horizontal pleiotropy; thus, the two approaches are truly complementary. The inconsistency in the second class (i.e., genes with weak colocalization but strong TWAS signals) can be understood and effectively reconciled. To this end, we propose a computational approach for locus-level colocalization analysis. We demonstrate that the joint TWAS and locus-level colocalization analysis improves specificity and sensitivity for implicating biologically relevant genes.


Subject(s)
Genome-Wide Association Study , Transcriptome , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Reproducibility of Results , Transcriptome/genetics
11.
Nat Commun ; 13(1): 1644, 2022 03 28.
Article in English | MEDLINE | ID: mdl-35347128

ABSTRACT

Few studies have explored the impact of rare variants (minor allele frequency < 1%) on highly heritable plasma metabolites identified in metabolomic screens. The Finnish population provides an ideal opportunity for such explorations, given the multiple bottlenecks and expansions that have shaped its history, and the enrichment for many otherwise rare alleles that has resulted. Here, we report genetic associations for 1391 plasma metabolites in 6136 men from the late-settlement region of Finland. We identify 303 novel association signals, more than one third at variants rare or enriched in Finns. Many of these signals identify genes not previously implicated in metabolite genome-wide association studies and suggest mechanisms for diseases and disease-related traits.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Alleles , Finland , Gene Frequency , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Male , Phenotype
12.
bioRxiv ; 2022 Oct 15.
Article in English | MEDLINE | ID: mdl-35313584

ABSTRACT

Synthetic glucocorticoids, such as dexamethasone, have been used as treatment for many immune conditions, such as asthma and more recently severe COVID-19. Single cell data can capture more fine-grained details on transcriptional variability and dynamics to gain a better understanding of the molecular underpinnings of inter-individual variation in drug response. Here, we used single cell RNA-seq to study the dynamics of the transcriptional response to glucocorticoids in activated Peripheral Blood Mononuclear Cells from 96 African American children. We employed novel statistical approaches to calculate a mean-independent measure of gene expression variability and a measure of transcriptional response pseudotime. Using these approaches, we demonstrated that glucocorticoids reverse the effects of immune stimulation on both gene expression mean and variability. Our novel measure of gene expression response dynamics, based on the diagonal linear discriminant analysis, separated individual cells by response status on the basis of their transcriptional profiles and allowed us to identify different dynamic patterns of gene expression along the response pseudotime. We identified genetic variants regulating gene expression mean and variability, including treatment-specific effects, and demonstrated widespread genetic regulation of the transcriptional dynamics of the gene expression response.

13.
Cell Rep ; 37(8): 110057, 2021 11 23.
Article in English | MEDLINE | ID: mdl-34818542

ABSTRACT

The gut microbiome exhibits extreme compositional variation between hominid hosts. However, it is unclear how this variation impacts host physiology across species and whether this effect can be mediated through microbial regulation of host gene expression in interacting epithelial cells. Here, we characterize the transcriptional response of human colonic epithelial cells in vitro to live microbial communities extracted from humans, chimpanzees, gorillas, and orangutans. We find that most host genes exhibit a conserved response, whereby they respond similarly to the four hominid microbiomes. However, hundreds of host genes exhibit a divergent response, whereby they respond only to microbiomes from specific host species. Such genes are associated with intestinal diseases in humans, including inflammatory bowel disease and Crohn's disease. Last, we find that inflammation-associated microbial species regulate the expression of host genes previously associated with inflammatory bowel disease, suggesting health-related consequences for species-specific host-microbiome interactions across hominids.


Subject(s)
Gastrointestinal Microbiome/genetics , Gene Expression Regulation/genetics , Hominidae/microbiology , Animals , Bacteria/genetics , Epithelial Cells/metabolism , Feces/microbiology , Gene Expression/genetics , Gorilla gorilla/microbiology , Hominidae/genetics , Humans , Inflammatory Bowel Diseases/genetics , Microbiota/genetics , Pan troglodytes/microbiology , Phylogeny , Pongo/microbiology , RNA, Ribosomal, 16S/genetics , Species Specificity
14.
Elife ; 102021 06 18.
Article in English | MEDLINE | ID: mdl-34142656

ABSTRACT

Social interactions and the overall psychosocial environment have a demonstrated impact on health, particularly for people living in disadvantaged urban areas. Here, we investigated the effect of psychosocial experiences on gene expression in peripheral blood immune cells of children with asthma in Metro Detroit. Using RNA-sequencing and a new machine learning approach, we identified transcriptional signatures of 19 variables including psychosocial factors, blood cell composition, and asthma symptoms. Importantly, we found 169 genes associated with asthma or allergic disease that are regulated by psychosocial factors and 344 significant gene-environment interactions for gene expression levels. These results demonstrate that immune gene expression mediates the link between negative psychosocial experiences and asthma risk.


Subject(s)
Asthma , Gene-Environment Interaction , Adolescent , Asthma/epidemiology , Asthma/genetics , Asthma/metabolism , Asthma/psychology , Child , Female , Genotype , Humans , Longitudinal Studies , Male , Michigan , Transcriptome/genetics
15.
Elife ; 102021 05 14.
Article in English | MEDLINE | ID: mdl-33988505

ABSTRACT

Genetic effects on gene expression and splicing can be modulated by cellular and environmental factors; yet interactions between genotypes, cell type, and treatment have not been comprehensively studied together. We used an induced pluripotent stem cell system to study multiple cell types derived from the same individuals and exposed them to a large panel of treatments. Cellular responses involved different genes and pathways for gene expression and splicing and were highly variable across contexts. For thousands of genes, we identified variable allelic expression across contexts and characterized different types of gene-environment interactions, many of which are associated with complex traits. Promoter functional and evolutionary features distinguished genes with elevated allelic imbalance mean and variance. On average, half of the genes with dynamic regulatory interactions were missed by large eQTL mapping studies, indicating the importance of exploring multiple treatments to reveal previously unrecognized regulatory loci that may be important for disease.


The activity of the genes in a cell depends on the type of cell they are in, the interactions with other genes, the environment and genetics. Active genes produce a greater number of mRNA molecules, which act as messenger molecules to instruct the cell to produce proteins. The amount of mRNA molecules in cells can be measured to assess the levels of gene activity. Genes produce mRNAs through a process called transcription, and the collection of all the mRNA molecules in a cell is called the transcriptome. Cells obtained from human samples can be grown in the lab under different conditions, and this can be used to transform them into different types of cells. These cells can then be exposed to different treatments ­ such as specific chemicals ­ to understand how the environment affects them. Cells derived from different people may respond differently to the same treatment based on their unique genetics. Exposing different types of cells from many people to different treatments can help explain how genetics, the environment and cell type affect gene activity. Findley et al. grew three different types of cells from six different people in the lab. The cells were exposed to 28 different treatments, which reflect different environmental changes. Studying all these different factors together allowed Findley et al. to understand how genetics, cell type and environment affect the activity of over 53,000 genes. Around half of the effects due to an interaction between genetics and the environment and had not been seen in other larger studies of the transcriptome. Many of these newly observed changes are in genes that have connections to different diseases, including heart disease. The results of Findley et al. provide evidence indicating to which extent lifestyle and the environment can interact with an individual's genetic makeup to impact gene activity and long-term health. The more researchers can understand these factors, the more useful they can be in helping to predict, detect and treat illnesses. The findings also show how genes and the environment interact, which may be relevant to understanding disease development. There is more work to be done to understand a wider range of environmental factors across more cell types. It will also be important to establish how this work on cells grown in the lab translates to human health.


Subject(s)
Gene Expression Regulation/genetics , Induced Pluripotent Stem Cells/metabolism , Lymphocytes/metabolism , Myocytes, Cardiac/metabolism , Alternative Splicing , Cell Differentiation/genetics , Cell Line , Female , Humans , Induced Pluripotent Stem Cells/cytology , Lymphocytes/cytology , Myocytes, Cardiac/cytology , Quantitative Trait Loci , Sequence Analysis, RNA
16.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Article in English | MEDLINE | ID: mdl-33864768

ABSTRACT

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Subject(s)
Disease/genetics , Multifactorial Inheritance/genetics , Population/genetics , RNA, Long Noncoding/genetics , Transcriptome , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Genetic Variation , Humans , Inflammatory Bowel Diseases/genetics , Organ Specificity/genetics , Quantitative Trait Loci
17.
G3 (Bethesda) ; 11(2)2021 02 09.
Article in English | MEDLINE | ID: mdl-33585870

ABSTRACT

Over the last decade, GWAS meta-analyses have used a strict P-value threshold of 5 × 10-8 to classify associations as significant. Here, we use our current understanding of frequently studied traits including lipid levels, height, and BMI to revisit this genome-wide significance threshold. We compare the performance of studies using the P = 5 × 10-8 threshold in terms of true and false positive rate to other multiple testing strategies: (1) less stringent P-value thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamini-Yekutieli procedure, and (3) controlling the Bayesian FDR with posterior probabilities. We applied these procedures to re-analyze results from the Global Lipids and GIANT GWAS meta-analysis consortia and supported them with extensive simulation that mimics the empirical data. We observe in simulated studies with sample sizes ∼20,000 and >120,000 that relaxing the P-value threshold to 5 × 10-7 increased discovery at the cost of 18% and 8% of additional loci being false positive results, respectively. FDR and Bayesian FDR are well controlled for both sample sizes with a few exceptions that disappear under a less stringent definition of true positives and the two approaches yield similar results. Our work quantifies the value of using a relaxed P-value threshold in large studies to increase their true positive discovery but also show the excess false positive rates due to such actions in modest-sized studies. These results may guide investigators considering different thresholds in replication studies and downstream work such as gene-set enrichment or pathway analysis. Finally, we demonstrate the viability of FDR-controlling procedures in GWAS.


Subject(s)
Genome-Wide Association Study , Bayes Theorem , Computer Simulation , Phenotype , Probability
18.
Genome Biol ; 22(1): 49, 2021 01 26.
Article in English | MEDLINE | ID: mdl-33499903

ABSTRACT

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.


Subject(s)
Gene Expression , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Genotype , Genes , Humans , Multifactorial Inheritance , Transcriptome
19.
Biometrics ; 77(2): 573-586, 2021 06.
Article in English | MEDLINE | ID: mdl-32627167

ABSTRACT

Directed acyclic mixed graphs (DAMGs) provide a useful representation of network topology with both directed and undirected edges subject to the restriction of no directed cycles in the graph. This graphical framework may arise in many biomedical studies, for example, when a directed acyclic graph (DAG) of interest is contaminated with undirected edges induced by some unobserved confounding factors (eg, unmeasured environmental factors). Directed edges in a DAG are widely used to evaluate causal relationships among variables in a network, but detecting them is challenging when the underlying causality is obscured by some shared latent factors. The objective of this paper is to develop an effective structural equation model (SEM) method to extract reliable causal relationships from a DAMG. The proposed approach, termed structural factor equation model (SFEM), uses the SEM to capture the network topology of the DAG while accounting for the undirected edges in the graph with a factor analysis model. The latent factors in the SFEM enable the identification and removal of undirected edges, leading to a simpler and more interpretable causal network. The proposed method is evaluated and compared to existing methods through extensive simulation studies, and illustrated through the construction of gene regulatory networks related to breast cancer.


Subject(s)
Models, Theoretical , Research Design , Causality , Factor Analysis, Statistical
20.
Am J Hum Genet ; 108(1): 25-35, 2021 01 07.
Article in English | MEDLINE | ID: mdl-33308443

ABSTRACT

Colocalization analysis has emerged as a powerful tool to uncover the overlapping of causal variants responsible for both molecular and complex disease phenotypes. The findings from colocalization analysis yield insights into the molecular pathways of complex diseases. In this paper, we conduct an in-depth investigation of the promise and limitations of the available colocalization analysis approaches. Focusing on variant-level colocalization approaches, we first establish the connections between various existing methods. We proceed to discuss the impacts of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. We identify a single analytical factor, the specification of prior enrichment levels, which can lead to severe inflation of false-positive colocalization findings. Meanwhile, the combination of many other analytical and practical factors all lead to diminished power. Consequently, we recommend the following strategies for the best practice of colocalization analysis: (1) estimating prior enrichment level from the observed data and (2) separating fine-mapping and colocalization analysis. Our analysis of 4,091 complex traits and the multi-tissue expression quantitative trait loci (eQTL) data from the GTEx (v.8) suggests that colocalizations of molecular QTLs and causal complex trait associations are widespread. However, only a small proportion can be confidently identified from currently available data due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications.


Subject(s)
Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Genetic Predisposition to Disease/genetics , Humans , Linkage Disequilibrium/genetics , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL
...