Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Genetics ; 225(1)2023 08 31.
Article in English | MEDLINE | ID: mdl-37369448

ABSTRACT

When quantitative longitudinal traits are risk factors for disease progression and subject to random biological variation, joint model analysis of time-to-event and longitudinal traits can effectively identify direct and/or indirect genetic association of single nucleotide polymorphisms (SNPs) with time-to-event. We present a joint model that integrates: (1) a multivariate linear mixed model describing trajectories of multiple longitudinal traits as a function of time, SNP effects, and subject-specific random effects and (2) a frailty Cox survival model that depends on SNPs, longitudinal trajectory effects, and subject-specific frailty accounting for dependence among multiple time-to-event traits. Motivated by complex genetic architecture of type 1 diabetes complications (T1DC) observed in the Diabetes Control and Complications Trial (DCCT), we implement a 2-stage approach to inference with bootstrap joint covariance estimation and develop a hypothesis testing procedure to classify direct and/or indirect SNP association with each time-to-event trait. By realistic simulation study, we show that joint modeling of 2 time-to-T1DC (retinopathy and nephropathy) and 2 longitudinal risk factors (HbA1c and systolic blood pressure) reduces estimation bias in genetic effects and improves classification accuracy of direct and/or indirect SNP associations, compared to methods that ignore within-subject risk factor variability and dependence among longitudinal and time-to-event traits. Through DCCT data analysis, we demonstrate feasibility for candidate SNP modeling and quantify effects of sample size and Winner's curse bias on classification for 2 SNPs identified as having indirect associations with time-to-T1DC traits. Joint analysis of multiple longitudinal and multiple time-to-event traits provides insight into complex traits architecture.


Subject(s)
Frailty , Humans , Genome-Wide Association Study/methods , Phenotype , Risk Factors , Disease Progression , Polymorphism, Single Nucleotide
2.
Stat Med ; 40(30): 6792-6817, 2021 12 30.
Article in English | MEDLINE | ID: mdl-34596256

ABSTRACT

Post-GWAS analysis, in many cases, focuses on fine-mapping targeted genetic regions discovered at GWAS-stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next-generation sequencing (NGS) technologies. Large-scale GWAS cohorts are necessary to identify target regions given the typically modest genetic effect sizes. In this context, two-phase sampling design and analysis is a cost-reduction technique that utilizes data collected during phase 1 GWAS to select an informative subsample for phase 2 sequencing. The main goal is to make inference for genetic variants measured via NGS by efficiently combining data from phases 1 and 2. We propose two approaches for selecting a phase 2 design under a budget constraint. The first method identifies sampling fractions that select a phase 2 design yielding an asymptotic variance covariance matrix with certain optimal characteristics, for example, smallest trace, via Lagrange multipliers (LM). The second relies on a genetic algorithm (GA) with a defined fitness function to identify exactly a phase 2 subsample. We perform comprehensive simulation studies to evaluate the empirical properties of the proposed designs for a genetic association study of a quantitative trait. We compare our methods against two ranked designs: residual-dependent sampling and a recently identified optimal design. Our findings demonstrate that the proposed designs, GA in particular, can render competitive power in combined phase 1 and 2 analysis compared with alternative designs while preserving type 1 error control. These results are especially evident under the more practical scenario where design values need to be defined a priori and are subject to misspecification. We illustrate the proposed methods in a study of triglyceride levels in the North Finland Birth Cohort of 1966. R code to reproduce our results is available at github.com/egosv/TwoPhase_postGWAS.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genetic Association Studies , Genotype , Humans , Phenotype
3.
Genet Epidemiol ; 45(7): 694-709, 2021 10.
Article in English | MEDLINE | ID: mdl-34224641

ABSTRACT

The X-chromosome is often excluded from genome-wide association studies because of analytical challenges. Some of the problems, such as the random, skewed, or no X-inactivation model uncertainty, have been investigated. Other considerations have received little to no attention, such as the value in considering nonadditive and gene-sex interaction effects, and the inferential consequence of choosing different baseline alleles (i.e., the reference vs. the alternative allele). Here we propose a unified and flexible regression-based association test for X-chromosomal variants. We provide theoretical justifications for its robustness in the presence of various model uncertainties, as well as for its improved power when compared with the existing approaches under certain scenarios. For completeness, we also revisit the autosomes and show that the proposed framework leads to a more robust approach than the standard method. Finally, we provide supporting evidence by revisiting several published association studies. Supporting Information for this article are available online.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Alleles , Chromosomes , Humans , X Chromosome Inactivation
4.
Biostatistics ; 21(2): 319-335, 2020 04 01.
Article in English | MEDLINE | ID: mdl-30247537

ABSTRACT

X-chromosome is often excluded from the so called "whole-genome" association studies due to the differences it exhibits between males and females. One particular analytical challenge is the unknown status of X-inactivation, where one of the two X-chromosome variants in females may be randomly selected to be silenced. In the absence of biological evidence in favor of one specific model, we consider a Bayesian model averaging framework that offers a principled way to account for the inherent model uncertainty, providing model averaging-based posterior density intervals and Bayes factors. We examine the inferential properties of the proposed methods via extensive simulation studies, and we apply the methods to a genetic association study of an intestinal disease occurring in about 20% of cystic fibrosis patients. Compared with the results previously reported assuming the presence of inactivation, we show that the proposed Bayesian methods provide more feature-rich quantities that are useful in practice.


Subject(s)
Genetic Association Studies , Models, Genetic , Models, Statistical , X Chromosome Inactivation , Bayes Theorem , Computer Simulation , Cystic Fibrosis/complications , Cystic Fibrosis/genetics , Female , Humans , Intestinal Diseases/etiology , Intestinal Diseases/genetics
5.
Genet Epidemiol ; 42(1): 104-116, 2018 02.
Article in English | MEDLINE | ID: mdl-29239496

ABSTRACT

We evaluate two-phase designs to follow-up findings from genome-wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation-maximization-based inference under a semiparametric maximum likelihood formulation tailored for post-GWAS inference. A GWAS-SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT-SNP-dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme-QT strata yields significant power improvements compared to marginal QT- or SNP-based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure.


Subject(s)
Genome-Wide Association Study , Genotype , Likelihood Functions , Quantitative Trait, Heritable , Sequence Analysis, DNA , Algorithms , Blood Pressure/genetics , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/genetics
6.
J Comput Graph Stat ; 25(2): 405-425, 2016.
Article in English | MEDLINE | ID: mdl-27752219

ABSTRACT

Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes. The models studied here can incorporate both continuous and binary responses, and can account for serial and cluster correlations. We consider Bayesian estimation for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample from the posterior distribution. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to aa association study of various complication outcomes related to type 1 diabetes. This article has supplementary material online.

7.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S77, 2014.
Article in English | MEDLINE | ID: mdl-25519405

ABSTRACT

Pleiotropy, which occurs when a single genetic factor influences multiple phenotypes, is present in many genetic studies of complex human traits. Longitudinal family data, such as the Genetic Analysis Workshop 18 data, combine the features of longitudinal studies in individuals and cross-sectional studies in families, thus providing richer information about the genetic and environmental factors associated with the trait of interest. We recently proposed a Bayesian latent variable methodology for the study of pleiotropy, in the presence of longitudinal and family correlation. The purpose of this work is to evaluate the Bayesian latent variable method in a real data setting using the Genetic Analysis Workshop 18 blood pressure phenotypes and sequenced genotype data. To detect single-nucleotide polymorphisms with pleiotropic effect on both diastolic and systolic blood pressure, we focused on a set of 6 single-nucleotide polymorphisms from chromosome 3 that was reported in the literature to be significantly associated with either diastolic blood pressure or the binary hypertension trait. Our analysis suggests that both diastolic blood pressure and systolic blood pressure are associated with the latent hypertension severity variable, but the analysis did not find any of the 6 single-nucleotide polymorphisms to have statistically significant pleiotropic effect on both diastolic blood pressure and systolic blood pressure.

8.
Genet Epidemiol ; 38(7): 599-609, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25132153

ABSTRACT

In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies.


Subject(s)
Genome-Wide Association Study , Bayes Theorem , Chromosome Mapping , Computer Simulation , Genome, Human , Genotype , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Probability
9.
Genet Epidemiol ; 36(4): 320-32, 2012 May.
Article in English | MEDLINE | ID: mdl-22460746

ABSTRACT

By systematic examination of common tag single-nucleotide polymorphisms (SNPs) across the genome, the genome-wide association study (GWAS) has proven to be a successful approach to identify genetic variants that are associated with complex diseases and traits. Although the per base pair cost of sequencing has dropped dramatically with the advent of the next-generation technologies, it may still only be feasible to obtain DNA sequence data for a portion of available study subjects due to financial constraints. Two-phase sampling designs have been used frequently in large-scale surveys and epidemiological studies where certain variables are too costly to be measured on all subjects. We consider two-phase stratified sampling designs for genetic association, in which tag SNPs for candidate genes or regions are genotyped on all subjects in phase 1, and a proportion of subjects are selected into phase 2 based on genotypes at one or more tag SNPs. Deep sequencing in the region is then applied to genotype phase 2 subjects at sequence SNPs. We investigate alternative sampling designs for selection of phase 2 subjects within strata defined by tag SNP genotypes and develop methods of inference for sequence SNP variant associations using data from both phases. In comparison to methods that use data from phase 2 alone, the combined analysis improves efficiency.


Subject(s)
Sequence Analysis, DNA/methods , Algorithms , Chromosome Mapping/methods , Computational Biology/methods , Computer Simulation , Genome, Human , Genome-Wide Association Study , Genotype , Humans , Models, Statistical , Molecular Epidemiology , Polymorphism, Single Nucleotide , Probability , Regression Analysis
10.
Biometrics ; 67(2): 445-53, 2011 Jun.
Article in English | MEDLINE | ID: mdl-20731648

ABSTRACT

The study of dependence between random variables is a mainstay in statistics. In many cases, the strength of dependence between two or more random variables varies according to the values of a measured covariate. We propose inference for this type of variation using a conditional copula model where the copula function belongs to a parametric copula family and the copula parameter varies with the covariate. In order to estimate the functional relationship between the copula parameter and the covariate, we propose a nonparametric approach based on local likelihood. Of importance is also the choice of the copula family that best represents a given set of data. The proposed framework naturally leads to a novel copula selection method based on cross-validated prediction errors. We derive the asymptotic bias and variance of the resulting local polynomial estimator, and outline how to construct pointwise confidence intervals. The finite-sample performance of our method is investigated using simulation studies and is illustrated using a subset of the Matched Multiple Birth data.


Subject(s)
Analysis of Variance , Statistics as Topic , Calibration , Computer Simulation , Multiple Birth Offspring , Sample Size
12.
Biom J ; 50(1): 97-109, 2008 Feb.
Article in English | MEDLINE | ID: mdl-17849385

ABSTRACT

This paper considers inference methods for case-control logistic regression in longitudinal setups. The motivation is provided by an analysis of plains bison spatial location as a function of habitat heterogeneity. The sampling is done according to a longitudinal matched case-control design in which, at certain time points, exactly one case, the actual location of an animal, is matched to a number of controls, the alternative locations that could have been reached. We develop inference methods for the conditional logistic regression model in this setup, which can be formulated within a generalized estimating equation (GEE) framework. This permits the use of statistical techniques developed for GEE-based inference, such as robust variance estimators and model selection criteria adapted for non-independent data. The performance of the methods is investigated in a simulation study and illustrated with the bison data analysis.


Subject(s)
Data Interpretation, Statistical , Logistic Models , Longitudinal Studies , Animal Migration , Animals , Case-Control Studies , Computer Simulation , Ecosystem , Female , Saskatchewan
13.
IEEE Trans Image Process ; 15(7): 1718-27, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16830896

ABSTRACT

Cellular automata are discrete dynamical systems which evolve on a discrete grid. Recent studies have shown that cellular automata with relatively simple rules can produce highly complex patterns. We develop likelihood-based methods for estimating rules of cellular automata aimed at the re-generation of observed regular patterns. Under noisy data, our approach is equivalent to estimating the local map of a stochastic cellular automaton. Direct computations of the maximum likelihood estimates are possible for regular binary patterns. The likelihood formulation of the problem is congenial with the use of the minimum description length principle as a model selection tool. We illustrate our method with a series of examples using binary images.


Subject(s)
Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Biomimetics/methods , Cell Physiological Phenomena , Computer Simulation , Likelihood Functions , Models, Statistical , Signal Processing, Computer-Assisted
14.
Genet Epidemiol ; 30(6): 519-30, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16800000

ABSTRACT

The multiplicity problem has become increasingly important in genetic studies as the capacity for high-throughput genotyping has increased. The control of False Discovery Rate (FDR) (Benjamini and Hochberg. [1995] J. R. Stat. Soc. Ser. B 57:289-300) has been adopted to address the problems of false positive control and low power inherent in high-volume genome-wide linkage and association studies. In many genetic studies, there is often a natural stratification of the m hypotheses to be tested. Given the FDR framework and the presence of such stratification, we investigate the performance of a stratified false discovery control approach (i.e. control or estimate FDR separately for each stratum) and compare it to the aggregated method (i.e. consider all hypotheses in a single stratum). Under the fixed rejection region framework (i.e. reject all hypotheses with unadjusted p-values less than a pre-specified level and then estimate FDR), we demonstrate that the aggregated FDR is a weighted average of the stratum-specific FDRs. Under the fixed FDR framework (i.e. reject as many hypotheses as possible and meanwhile control FDR at a pre-specified level), we specify a condition necessary for the expected total number of true positives under the stratified FDR method to be equal to or greater than that obtained from the aggregated FDR method. Application to a recent Genome-Wide Association (GWA) study by Maraganore et al. ([2005] Am. J. Hum. Genet. 77:685-693) illustrates the potential advantages of control or estimation of FDR by stratum. Our analyses also show that controlling FDR at a low rate, e.g. 5% or 10%, may not be feasible for some GWA studies.


Subject(s)
Diabetes Mellitus, Type 1/genetics , Genetic Predisposition to Disease/genetics , Genome, Human , Models, Genetic , Polymorphism, Single Nucleotide , Chromosome Mapping , Data Interpretation, Statistical , Diabetes Mellitus, Type 1/diagnosis , False Positive Reactions , Humans , Phenotype , Quantitative Trait, Heritable
15.
Lifetime Data Anal ; 12(1): 21-33, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16583297

ABSTRACT

The competing risks model is useful in settings in which individuals/units may die/fail for different reasons. The cause specific hazard rates are taken to be piecewise constant functions. A complication arises when some of the failures are masked within a group of possible causes. Traditionally, statistical inference is performed under the assumption that the failure causes act independently on each item. In this paper we propose an EM-based approach which allows for dependent competing risks and produces estimators for the sub-distribution functions. We also discuss identifiability of parameters if none of the masked items have their cause of failure clarified in a second stage analysis (e.g. autopsy). The procedures proposed are illustrated with two datasets.


Subject(s)
Algorithms , Models, Statistical , Risk Assessment/methods , Animals , Carcinogens/toxicity , Computers , Nitrosamines/toxicity , Rodentia
SELECTION OF CITATIONS
SEARCH DETAIL
...