Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
1.
PLoS Genet ; 19(11): e1010597, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38011285

ABSTRACT

Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual's genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease , Phenotype , Multifactorial Inheritance/genetics , Machine Learning , Risk Factors
2.
Res Sq ; 2023 Oct 05.
Article in English | MEDLINE | ID: mdl-37886469

ABSTRACT

Structural variations (SVs) are important contributors to the genetics of human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. We analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (N = 16,905) and identified 400,234 (168,223 high-quality) SVs. Laboratory validation yielded a sensitivity of 82% (85% for high-quality). We found a significant burden of deletions and duplications in AD cases, particularly for singletons and homozygous events. On AD genes, we observed the ultra-rare SVs associated with the disease, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1. Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, exemplified by a 5k deletion in complete LD with rs143080277 in NCK2. We also identified 16 SVs associated with AD and 13 SVs linked to AD-related pathological/cognitive endophenotypes. This study highlights the pivotal role of SVs in shaping our understanding of AD genetics.

3.
medRxiv ; 2023 Sep 13.
Article in English | MEDLINE | ID: mdl-37745545

ABSTRACT

Structural variations (SVs) are important contributors to the genetics of numerous human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. Here, we analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP, N=16,905 subjects) and identified 400,234 (168,223 high-quality) SVs. We found a significant burden of deletions and duplications in AD cases (OR=1.05, P=0.03), particularly for singletons (OR=1.12, P=0.0002) and homozygous events (OR=1.10, P<0.0004). On AD genes, the ultra-rare SVs, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1, were associated with AD (SKAT-O P=0.004). Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, e.g., a deletion (chr2:105731359-105736864) in complete LD (R2=0.99) with rs143080277 (chr2:105749599) in NCK2. We also identified 16 SVs associated with AD and 13 SVs associated with AD-related pathological/cognitive endophenotypes. Our findings demonstrate the broad impact of SVs on AD genetics.

4.
Front Aging Neurosci ; 15: 1168638, 2023.
Article in English | MEDLINE | ID: mdl-37577355

ABSTRACT

To better capture the polygenic architecture of Alzheimer's disease (AD), we developed a joint genetic score, MetaGRS. We incorporated genetic variants for AD and 24 other traits from two independent cohorts, NACC (n = 3,174, training set) and UPitt (n = 2,053, validation set). One standard deviation increase in the MetaGRS is associated with about 57% increase in the AD risk [hazard ratio (HR) = 1.577, p = 7.17 E-56], showing little difference from the HR for AD GRS alone (HR = 1.579, p = 1.20E-56), suggesting similar utility of both models. We also conducted APOE-stratified analyses to assess the role of the e4 allele on risk prediction. Similar to that of the combined model, our stratified results did not show a considerable improvement of the MetaGRS. Our study showed that the prediction power of the MetaGRS significantly outperformed that of the reference model without any genetic information, but was effectively equivalent to the prediction power of the AD GRS.

5.
Environ Epigenet ; 9(1): dvac026, 2023.
Article in English | MEDLINE | ID: mdl-36694712

ABSTRACT

Epidural anesthesia is an effective pain relief modality, widely used for labor analgesia. Childhood asthma is one of the commonest chronic medical illnesses in the USA which places a significant burden on the health-care system. We recently demonstrated a negative association between the duration of epidural anesthesia and the development of childhood asthma; however, the underlying molecular mechanisms still remain unclear. In this study of 127 mother-child pairs comprised of 75 Non-Hispanic Black (NHB) and 52 Non-Hispanic White (NHW) from the Newborn Epigenetic Study, we tested the hypothesis that umbilical cord blood DNA methylation mediates the association between the duration of exposure to epidural anesthesia at delivery and the development of childhood asthma and whether this differed by race/ethnicity. In the mother-child pairs of NHB ancestry, the duration of exposure to epidural anesthesia was associated with a marginally lower risk of asthma (odds ratio = 0.88, 95% confidence interval = 0.76-1.01) for each 1-h increase in exposure to epidural anesthesia. Of the 20 CpGs in the NHB population showing the strongest mediation effect, 50% demonstrated an average mediation proportion of 52%, with directional consistency of direct and indirect effects. These top 20 CpGs mapped to 21 genes enriched for pathways engaged in antigen processing, antigen presentation, protein ubiquitination and regulatory networks related to the Major Histocompatibility Complex (MHC) class I complex and Nuclear Factor Kappa-B (NFkB) complex. Our findings suggest that DNA methylation in immune-related pathways contributes to the effects of the duration of exposure to epidural anesthesia on childhood asthma risk in NHB offspring.

6.
Microbiome ; 10(1): 86, 2022 06 07.
Article in English | MEDLINE | ID: mdl-35668471

ABSTRACT

BACKGROUND: The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high dimensionality and sparsity of typical microbiome profiles. Phylogenetic information is often incorporated to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effects of microbes, and phylogenetic information should be incorporated in a data-supervised fashion. RESULTS: In this work, we propose a local collapsing test called phylogeny-guided microbiome OTU-specific association test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenetic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community level to OTU level. Using simulation studies, we show that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, in real data applications on bacterial vaginosis and on preterm birth, we find that POST can identify similar or more outcome-associated OTUs that are of biological relevance compared to existing methods. CONCLUSIONS: Using POST, we show that adaptively leveraging the phylogenetic information can enhance the selection performance of associated microbiome features by improving the overall true-positive and false-positive detection. We developed a user friendly R package POSTm which is freely available on CRAN ( https://CRAN.R-project.org/package=POSTm ). Video Abstract.


Subject(s)
Microbiota , Premature Birth , Computational Biology/methods , Computer Simulation , Female , Humans , Infant, Newborn , Microbiota/genetics , Phylogeny
7.
Front Genet ; 12: 752390, 2021.
Article in English | MEDLINE | ID: mdl-34804120

ABSTRACT

Alzheimer's Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer's Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).

8.
Front Genet ; 12: 710055, 2021.
Article in English | MEDLINE | ID: mdl-34795690

ABSTRACT

The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based G×E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p-value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 105, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.

9.
Front Genet ; 12: 709555, 2021.
Article in English | MEDLINE | ID: mdl-34567069

ABSTRACT

Genomic studies have been a major approach to elucidating disease etiology and to exploring potential targets for treatments of many complex diseases. Statistical analyses in these studies often face the challenges of multiplicity, weak signals, and the nature of dependence among genetic markers. This situation becomes even more complicated when multi-omics data are available. To integrate the data from different platforms, various integrative analyses have been adopted, ranging from the direct union or intersection operation on sets derived from different single-platform analysis to complex hierarchical multi-level models. The former ignores the biological relationship between molecules while the latter can be hard to interpret. We propose in this study an integrative approach that combines both single nucleotide variants (SNVs) and copy number variations (CNVs) in the same genomic unit to co-localize the concurrent effect and to deal with the sparsity due to rare variants. This approach is illustrated with simulation studies to evaluate its performance and is applied to low-density lipoprotein cholesterol and triglyceride measurements from Taiwan Biobank. The results show that the proposed method can more effectively detect the collective effect from both SNVs and CNVs compared to traditional methods. For the biobank analysis, the identified genetic regions including the gene VNN2 could be novel and deserve further investigation.

10.
Bioinformatics ; 37(16): 2259-2265, 2021 Aug 25.
Article in English | MEDLINE | ID: mdl-33674827

ABSTRACT

MOTIVATION: Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS: We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION: R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/~jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
PLoS Comput Biol ; 16(5): e1007797, 2020 05.
Article in English | MEDLINE | ID: mdl-32365089

ABSTRACT

Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.


Subject(s)
Computational Biology/methods , DNA Copy Number Variations/genetics , Algorithms , Area Under Curve , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Genome, Human/genetics , Genome-Wide Association Study/methods , Genomics/methods , Humans , Polymorphism, Single Nucleotide/genetics , Spatial Analysis
12.
Genet Epidemiol ; 44(6): 611-619, 2020 09.
Article in English | MEDLINE | ID: mdl-32216117

ABSTRACT

Genome-wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single-nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans-eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large-scale signal detection. We illustrate how the HC-based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.


Subject(s)
Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Computer Simulation , Data Analysis , Gene Expression Regulation , Genome-Wide Association Study , Humans , Models, Genetic
13.
Curr Med Res Opin ; 36(6): 1025-1032, 2020 06.
Article in English | MEDLINE | ID: mdl-32212939

ABSTRACT

Objectives: Childhood asthma is a common chronic illness that has been associated with mode of delivery. However, the effect of cesarean delivery alone does not fully account for the increased prevalence of childhood asthma. We tested the hypothesis that neuraxial anesthesia used for labor analgesia and cesarean delivery alters the risk of developing childhood asthma.Methods: Within the Newborn Epigenetics Study birth cohort, 196 mother and child pairs with entries in the electronic anesthesia records were included. From these records, data on maternal anesthesia type, duration of exposure, and drugs administered peripartum were abstracted and combined with questionnaire-derived prenatal risk factors and medical records and questionnaire-derived asthma diagnosis data in children. Logistic regression models were used to evaluate associations between type of anesthesia, duration of anesthesia, and the development of asthma in males and females.Results: We found that longer duration of epidural anesthesia was associated with a lower risk of asthma in male children (OR = 0.80; 95% CI = 0.66-0.95) for each hour of epidural exposure. Additionally, a unit increase in the composite dose of local anesthetics and opioid analgesics administered via the spinal route was associated with a lower risk of asthma in both male (OR = 0.59, 95% CI = 0.36-0.96) and female children (OR 0.26, 95% CI 0.09-0.82).Conclusion: Our data suggest that peripartum exposure to neuraxial anesthesia may reduce the risk of childhood asthma primarily in males. Larger human studies and model systems with longer follow-up are required to elucidate these findings.


Subject(s)
Anesthesia, Epidural/adverse effects , Anesthesia, Obstetrical/adverse effects , Asthma/etiology , Epigenesis, Genetic , Adult , Cesarean Section/adverse effects , Child , Child, Preschool , Female , Humans , Infant, Newborn , Logistic Models , Male , Pregnancy , Retrospective Studies
14.
Genet Epidemiol ; 44(3): 272-282, 2020 04.
Article in English | MEDLINE | ID: mdl-31943371

ABSTRACT

Testing the association between single-nucleotide polymorphism (SNP) effects and a response is often carried out through kernel machine methods based on least squares, such as the sequence kernel association test (SKAT). However, these least-squares procedures are designed for a normally distributed conditional response, which may not apply. Other robust procedures such as the quantile regression kernel machine (QRKM) restrict the choice of the loss function and only allow inference on conditional quantiles. We propose a general and robust kernel association test with a flexible choice of the loss function, no distributional assumptions, and has SKAT and QRKM as special cases. We evaluate our proposed robust association test (RobKAT) across various data distributions through a simulation study. When errors are normally distributed, RobKAT controls type I error and shows comparable power with SKAT. In all other distributional settings investigated, our robust test has similar or greater power than SKAT. Finally, we apply our robust testing method to data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) clinical trial to detect associations between selected genes including the major histocompatibility complex (MHC) region on chromosome six and neurotropic herpesvirus antibody levels in schizophrenia patients. RobKAT detected significant association with four SNP sets (HST1H2BJ, MHC, POM12L2, and SLC17A1), three of which were undetected by SKAT.


Subject(s)
Algorithms , Genetic Association Studies , Computer Simulation , Humans , Models, Genetic , Polymorphism, Single Nucleotide/genetics , Selection, Genetic
15.
World Allergy Organ J ; 12(11): 100074, 2019 Nov.
Article in English | MEDLINE | ID: mdl-31709028

ABSTRACT

BACKGROUND AND OBJECTIVE: This study aimed to establish reference equations for spirometry in healthy Taiwanese children and assess the applicability of the Global Lung Function Initiative (GLI)-2012 equations to Taiwanese children. METHODS: Spirometric data collected from 757 healthy Taiwanese children aged 5 to 18 years in a population-based cohort study. Prediction equations derived using linear regression and the generalized additive models for location, scale and shape (GAMLSS) method, respectively. RESULTS: The GLI-2012 South East Asian equations did not provide a close fit with mean ± standard error z-scores of -0.679 ± 0.030 (FVC), -0.186 ± 0.044 (FEV1), -0.875 ± 0.049 (FEV1/FVC ratio) and -2.189 ± 0.063 (FEF25-75) for girls; and 0.238 ± 0.059, -0.061 ± 0.053, -0.513 ± 0.059 and -1.896 ± 0.077 for boys. The proposed GAMLSS models took age, height, and weight into account. GAMLSS models for boys and girls captured the characteristics of spirometric data in the study population closely in contrast to the linear regression models and the GLI-2012 equations. CONCLUSION: This study provides up-to-date reference values for spirometry using GAMLSS modeling in healthy Taiwanese children aged 5 to 18 years. Our study provides evidence that the GLI-2012 reference equations are not properly matched to spirometric data in a contemporary Taiwanese child population, indicating the urgent need for an update of GLI reference values by inclusion of more data of non-Caucasian decent.

16.
Environ Epigenet ; 5(3): dvz014, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31528362

ABSTRACT

Cadmium (Cd) is a ubiquitous environmental pollutant associated with a wide range of health outcomes including cancer. However, obscure exposure sources often hinder prevention efforts. Further, although epigenetic mechanisms are suspected to link these associations, gene sequence regions targeted by Cd are unclear. Aberrant methylation of a differentially methylated region (DMR) on the MEG3 gene that regulates the expression of a cluster of genes including MEG3, DLK1, MEG8, MEG9 and DIO3 has been associated with multiple cancers. In 287 infant-mother pairs, we used a combination of linear regression and the Getis-Ord Gi* statistic to determine if maternal blood Cd concentrations were associated with offspring CpG methylation of the sequence region regulating a cluster of imprinted genes including MEG3. Correlations were used to examine potential sources and routes. We observed a significant geographic co-clustering of elevated prenatal Cd levels and MEG3 DMR hypermethylation in cord blood (P = 0.01), and these findings were substantiated in our statistical models (ß = 1.70, se = 0.80, P = 0.03). These associations were strongest in those born to African American women (ß = 3.52, se = 1.32, P = 0.01) compared with those born to White women (ß = 1.24, se = 2.11, P = 0.56) or Hispanic women (ß = 1.18, se = 1.24, P = 0.34). Consistent with Cd bioaccumulation during the life course, blood Cd levels increased with age (ß = 0.015 µg/dl/year, P = 0.003), and Cd concentrations were significantly correlated between blood and urine (ρ > 0.47, P < 0.01), but not hand wipe, soil or house dust concentrations (P > 0.05). Together, these data support that prenatal Cd exposure is associated with aberrant methylation of the imprint regulatory element for the MEG3 gene cluster at birth. However, neither house-dust nor water are likely exposure sources, and ingestion via contaminated hands is also unlikely to be a significant exposure route in this population. Larger studies are required to identify routes and sources of exposure.

17.
PLoS Comput Biol ; 15(2): e1006722, 2019 02.
Article in English | MEDLINE | ID: mdl-30779729

ABSTRACT

Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.


Subject(s)
Computational Biology/methods , Genetic Association Studies/methods , Sequence Analysis, DNA/methods , Angiopoietin-Like Protein 4/genetics , Cholesterol Ester Transfer Proteins/genetics , Computer Simulation , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Humans , Models, Genetic , Proprotein Convertase 9/genetics , Protein Structure, Tertiary , Risk Factors
18.
J Am Stat Assoc ; 114(528): 1787-1799, 2019.
Article in English | MEDLINE | ID: mdl-31929665

ABSTRACT

This paper addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate as a new measure for false negative control to account for the variability of false negative proportion. Novel data-adaptive procedures are developed to control signal missing rate without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant SNPs while retaining a high proportion of relevant SNPs for subsequent polygenic analysis.

20.
Stat Biosci ; 10(1): 117-138, 2018 Apr.
Article in English | MEDLINE | ID: mdl-30420901

ABSTRACT

Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.

SELECTION OF CITATIONS
SEARCH DETAIL
...