Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Int J Biostat ; 2023 Sep 26.
Article in English | MEDLINE | ID: mdl-37743670

ABSTRACT

In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.

2.
Pharm Stat ; 22(1): 79-95, 2023 01.
Article in English | MEDLINE | ID: mdl-36054538

ABSTRACT

We propose a model selection criterion for correlated survival data when the cluster size is informative to the outcome. This approach, called Resampling Cluster Survival Information Criterion (RCSIC), uses the Cox proportional hazards model that is weighted with the inverse of the cluster size. The RCSIC based on the within-cluster resampling idea takes into account the possible variability of the within-cluster subsampling and the possible informativeness of cluster sizes. The RCSIC allows for easy execution for the within-cluster resampling idea without a large number of resamples of the data. In contrast with the traditional model selection method in survival analysis, the RCSIC has an additional penalization for the within-cluster subsampling variability. Our simulations show the satisfactory results where the RCSIC provides a more robust power for variable selection in terms of clustered survival analysis, regardless of whether informative cluster size exists or not. Applying the RCSIC method to a periodontal disease studies, we identify the tooth loss in patients associated with the risk factors, Age, Filled Tooth, Molar, Crown, Decayed Tooth, and Smoking Status, respectively.


Subject(s)
Cluster Analysis , Humans , Proportional Hazards Models , Survival Analysis , Risk Factors , Computer Simulation
3.
Comput Math Methods Med ; 2021: 8812282, 2021.
Article in English | MEDLINE | ID: mdl-33628328

ABSTRACT

In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.


Subject(s)
Genetic Variation , Models, Genetic , Algorithms , Computational Biology , Computer Simulation , Genetic Association Studies , Genetic Predisposition to Disease , Genetics, Population , Genome-Wide Association Study , Humans , Multivariate Analysis , Pedigree , Phenotype
4.
Article in English | MEDLINE | ID: mdl-33156000

ABSTRACT

Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the correlational structure among p-values from multiple hypothesis testing. Although some existing p-value combination methods had been modified to overcome the potential limitations, there is no uniformly most powerful method for combining correlated p-values in genetic data analysis. Therefore, providing a p-value combination method that can robustly control type I errors and keep the good power rates is necessary. In this paper, we propose an empirical method based on the gamma distribution (EMGD) for combining dependent p-values from multiple hypothesis testing. The proposed test, EMGD, allows for flexible accommodating the highly correlated p-values from the multiple hypothesis testing into a unified p-value for examining the combined hypothesis that we are interested in. The EMGD retains the robustness character of the empirical Brown's method (EBM) for pooling the dependent p-values from multiple hypothesis testing. Moreover, the EMGD keeps the character of the method based on the gamma distribution that simultaneously retains the advantages of the z-transform test and the gamma-transform test for combining dependent p-values from multiple statistical tests. The two characters lead to the EMGD that can keep the robust power for combining dependent p-values from multiple hypothesis testing. The performance of the proposed method EMGD is illustrated with simulations and real data applications by comparing with the existing methods, such as Kost and McDermott's method, the EBM and the harmonic mean p-value method.

5.
PLoS One ; 15(6): e0233847, 2020.
Article in English | MEDLINE | ID: mdl-32559184

ABSTRACT

In the area of genetic epidemiology, studies of the genotype-phenotype associations have made significant contributions to human complicated trait genetics. These studies depend on specialized statistical methods for uncover the association between traits and genetic variants, both common and rare variants. Often, in analyzing such studies, potentially confounding factors, such as social and environmental conditions, are required to be involved. Multiple linear regression is the most widely used type of regression analysis when the outcome of interest is quantitative traits. Many statistical tests for identifying genotype-phenotype associations using linear regression rely on the assumption that the traits (or the residuals) of the regression follow a normal distribution. In genomic research, the rank-based inverse normal transformation (INT) is one of the most popular approaches to reach normally distributed traits (or normally distributed residuals). Many researchers believe that applying the INT to the non-normality of the traits (or the non-normality of the residuals) is required for valid inference, because the phenotypic (or residual) outliers and non-normality have the significant influence on both the type I error rate control and statistical power, especially under the situation in rare-variant association testing procedures. Here we propose a test for exploring the association of the rare variant with the quantitative trait by using a fully adjusted full-stage INT. Using simulations we show that the fully adjusted full-stage INT is more appropriate than the existing INT methods, such as the fully adjusted two-stage INT and the INT-based omnibus test, in testing genotype-phenotype associations with rare variants, especially when genotypes are uncorrelated with covariates. The fully adjusted full-stage INT retains the advantages of the fully adjusted two-stage INT and ameliorates the problems of the fully adjusted two-stage INT for analysis of rare variants under non-normality of the trait. We also present theoretical results on these desirable properties. In addition, the two available methods with non-normal traits, the quantile/median regression method and the Yeo-Johnson power transformation, are also included in simulations for comparison with these desirable properties.


Subject(s)
Gene-Environment Interaction , Genome-Wide Association Study/methods , Models, Genetic , Humans , Normal Distribution , Polymorphism, Genetic
6.
Genet Epidemiol ; 42(7): 621-635, 2018 10.
Article in English | MEDLINE | ID: mdl-30188589

ABSTRACT

Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.


Subject(s)
Genetic Association Studies , Genetic Variation , Algorithms , Case-Control Studies , Computer Simulation , Cross-Sectional Studies , Humans , Hypertension/genetics , Models, Genetic , Numerical Analysis, Computer-Assisted , Phenotype , Retrospective Studies , Risk Factors
7.
Genet Epidemiol ; 41(6): 511-522, 2017 09.
Article in English | MEDLINE | ID: mdl-28580640

ABSTRACT

Family-based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time-to-event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree-based burden and kernel association tests for time-to-event outcomes with right censoring for pedigree data, referred to FamRATS (family-based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time-to-event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population-based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large-scale whole-genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population-based, pedigree-based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome-wide analysis.


Subject(s)
Genetic Association Studies , Quantitative Trait, Heritable , Sequence Analysis, DNA , Algorithms , Computer Simulation , Databases, Genetic , Diabetes Mellitus, Type 2/genetics , Family , Humans , Models, Genetic , Pedigree , Proportional Hazards Models , Survival Analysis
8.
Genet Epidemiol ; 40(2): 101-12, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26783077

ABSTRACT

Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree-based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree-based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed-effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes.


Subject(s)
Black or African American/genetics , Cardiovascular Diseases/genetics , Diabetes Mellitus, Type 2/genetics , Janus Kinase 1/genetics , N-Acetylglucosaminyltransferases/genetics , White People/genetics , Cardiovascular Diseases/epidemiology , Computer Simulation , Diabetes Mellitus, Type 2/epidemiology , Exome/genetics , Family , Genetic Association Studies , Genetic Predisposition to Disease , Humans , Linear Models , Longitudinal Studies , Models, Genetic , Pedigree , Phenotype
9.
Biostatistics ; 16(2): 222-39, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25481194

ABSTRACT

The genetic basis of complex diseases often involves multiple causative loci. Under such a disease etiology, assuming one disease locus in linkage disequilibrium mapping is likely to induce bias and lead to efficiency loss in disease locus estimation. An approach is needed for simultaneously localizing multiple functional loci within the same region. However, due to the increasing number of parameters accompanying disease loci, these estimates can be computationally infeasible. To circumvent this problem, we propose to estimate the main and two-adjacent-locus joint effects and a nuisance parameter at the disease loci separately through a linear approximation. Estimates of the genetic effects are entered into a generalized estimating equation to estimate disease loci, and the procedure is conducted iteratively until convergence. The proposed method provides estimates and confidence intervals (CIs) for the disease loci, the genetic main effects, and the joint effects of two adjacent disease loci, with the CIs for the disease loci providing useful regions for further fine-mapping. We apply the proposed approach to a data example of case-control studies. Results of the simulations and data example suggest that the developed method performs well in terms of bias, variance, and coverage probability under scenarios with up to three disease loci.


Subject(s)
Data Interpretation, Statistical , Genetic Loci/genetics , Linkage Disequilibrium/genetics , Models, Genetic , Case-Control Studies , Humans , Peripheral Arterial Disease/genetics
10.
J Appl Stat ; 34(5): 563-575, 2007.
Article in English | MEDLINE | ID: mdl-38817915

ABSTRACT

We propose two novel diagnostic measures for the detection of influential observations for regression parameters in linear regression. Traditional diagnostic statistics focus on the effect of deletion of data points either on parameter estimates, or on predicted values. A data point is regarded as influential by the new methods if its inclusion determines a significantly different likelihood function for the parameter of interest. The concerned likelihood function is asymptotically valid for practically all underlying distributions whose second moments exist.

11.
J Virol ; 80(18): 8989-99, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16940511

ABSTRACT

Baculoviruses, a family of large, rod-shaped viruses that mainly infect lepidopteran insects, have been widely used to transduce various cells for exogenous gene expression. Nonetheless, how a virus controls its transcription program in cells is poorly understood. With a custom-made baculovirus DNA microarray, we investigated the recombinant Autographa californica multiple nucleopolyhedrosis virus (AcMNPV) gene expression program in lepidopteran Sf21 cells over the time course of infection. Our analysis of transcription kinetics in the cells uncovered sequential viral gene expression patterns possibly regulated by different mechanisms during different phases of infection. To gain further insight into the regulatory network, we investigated the transcription program of a mutant virus deficient in an early transactivator (pe38) and uncovered several pe38-dependent and pe38-independent genes. This study of baculovirus dynamic transcription programs in different virus genetic backgrounds provides new molecular insights into how gene expression in viruses is regulated.


Subject(s)
Gene Expression Regulation, Viral , Nucleopolyhedroviruses/genetics , Transcription, Genetic , Animals , Baculoviridae/genetics , Cell Line , Cluster Analysis , Insecta , Kinetics , Moths , Oligonucleotide Array Sequence Analysis , Recombinant Proteins/chemistry , Time Factors , Transcriptional Activation , Virus Replication
12.
Am J Kidney Dis ; 39(6): 1245-54, 2002 Jun.
Article in English | MEDLINE | ID: mdl-12046038

ABSTRACT

Relative hypoparathyroidism (parathyroid hormone [PTH] < or = 200 pg/mL) is prevalent in hemodialysis (HD) patients, with unknown pathogenesis and prognosis. Thus, to clarify risk factors and prognosis of time-dependent relative hypoparathyroidism in HD patients, a retrospective cohort study was performed for 126 HD patients with four or more PTH determinations and no previous total or subtotal parathyroidectomy. Values for intact PTH, ionized calcium, phosphate, magnesium, albumin, creatinine, urea reduction ratio (URR), glucose, hemoglobin A1c (HbA1c), aluminum, and 1,25(OH)2D were obtained at enrollment and at some time during follow-up. The prevalence of relative hypoparathyroidism at entry was 76 of 126 patients (60.3%). Univariate analysis showed that patients with hypoparathyroidism were older, more likely to have diabetes, and had greater ionized calcium levels and lower phosphate, albumin, blood urea nitrogen (BUN), and creatinine levels. Patients with diabetes were older and had a shorter duration of dialysis therapy and lower PTH, phosphate, albumin, BUN, and creatinine levels and URRs. Conversely, multivariate analysis showed that PTH levels at entry were associated directly with creatinine levels and inversely with age and ionized calcium levels (but not diabetes). During follow-up, PTH levels fluctuated concomitantly with ionized calcium and phosphate levels over time in all patients. Time-dependent PTH levels were associated directly with duration of dialysis therapy and use of vitamin D and phosphate and albumin levels, but inversely with age and ionized calcium and magnesium levels (but not glucose or HbA1c levels). Interestingly, time-dependent PTH levels were independently associated with survival after adjusting for traditional risk factors (diabetes, age, albumin and creatinine levels, and URR) and duration of dialysis therapy. We conclude that in HD patients, relative hypoparathyroidism was not associated with diabetes per se. Time-dependent PTH levels were associated with age, duration of dialysis, and levels of ionized calcium, phosphate, albumin, and magnesium. Moreover, relative hypoparathyroidism at entry and lower time-dependent PTH levels predict mortality.


Subject(s)
Hypoparathyroidism/mortality , Kidney Failure, Chronic/mortality , Kidney Failure, Chronic/therapy , Renal Dialysis/mortality , Age Factors , Calcium/blood , Cohort Studies , Diabetes Complications , Female , Humans , Hypoparathyroidism/blood , Hypoparathyroidism/etiology , Kidney Failure, Chronic/blood , Kidney Failure, Chronic/complications , Longitudinal Studies , Magnesium/blood , Male , Middle Aged , Parathyroid Hormone/blood , Phosphates/blood , Regression Analysis , Renal Dialysis/adverse effects , Retrospective Studies , Risk Factors , Serum Albumin/metabolism , Survival Analysis , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...