Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 123
Filter
1.
Sci Rep ; 14(1): 1628, 2024 01 18.
Article in English | MEDLINE | ID: mdl-38238368

ABSTRACT

This study aims to develop an advanced mathematic model and investigate when and how will the COVID-19 in the US be evolved to endemic. We employed a nonlinear ordinary differential equations-based model to simulate COVID-19 transmission dynamics, factoring in vaccination efforts. Multi-stability analysis was performed on daily new infection data from January 12, 2021 to December 12, 2022 across 50 states in the US. Key indices such as eigenvalues and the basic reproduction number were utilized to evaluate stability and investigate how the pandemic COVD-19 will evolve to endemic in the US. The transmissional, recovery, vaccination rates, vaccination effectiveness, eigenvalues and reproduction numbers ([Formula: see text] and [Formula: see text]) in the endemic equilibrium point were estimated. The stability attractor regions for these parameters were identified and ranked. Our multi-stability analysis revealed that while the endemic equilibrium points in the 50 states remain unstable, there is a significant trend towards stable endemicity in the US. The study's stability analysis, coupled with observed epidemiological waves in the US, suggested that the COVID-19 pandemic may not conclude with the virus's eradication. Nevertheless, the virus is gradually becoming endemic. Effectively strategizing vaccine distribution is pivotal for this transition.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Pandemics/prevention & control , Models, Theoretical , Nonlinear Dynamics
2.
medRxiv ; 2023 Aug 02.
Article in English | MEDLINE | ID: mdl-37577650

ABSTRACT

Gene expression profiles that connect drug perturbations, disease gene expression signatures, and clinical data are important for discovering potential drug repurposing indications. However, the current approach to gene expression reversal has several limitations. First, most methods focus on validating the reversal expression of individual genes. Second, there is a lack of causal approaches for identifying drug repurposing candidates. Third, few methods for passing and summarizing information on a graph have been used for drug repurposing analysis, with classical network propagation and gene set enrichment analysis being the most common. Fourth, there is a lack of graph-valued association analysis, with current approaches using real-valued association analysis one gene at a time to reverse abnormal gene expressions to normal gene expressions. To overcome these limitations, we propose a novel causal inference and graph neural network (GNN)-based framework for identifying drug repurposing candidates. We formulated a causal network as a continuous constrained optimization problem and developed a new algorithm for reconstructing large-scale causal networks of up to 1,000 nodes. We conducted large-scale simulations that demonstrated good false positive and false negative rates. To aggregate and summarize information on both nodes and structure from the spatial domain of the causal network, we used directed acyclic graph neural networks (DAGNN). We also developed a new method for graph regression in which both dependent and independent variables are graphs. We used graph regression to measure the degree to which drugs reverse altered gene expressions of disease to normal levels and to select potential drug repurposing candidates. To illustrate the application of our proposed methods for drug repurposing, we applied them to phase I and II L1000 connectivity map perturbational profiles from the Broad Institute LINCS, which consist of gene-expression profiles for thousands of perturbagens at a variety of time points, doses, and cell lines, as well as disease gene expression data under-expressed and over-expressed in response to SARS-CoV-2.

3.
Genet Epidemiol ; 47(6): 409-431, 2023 09.
Article in English | MEDLINE | ID: mdl-37101379

ABSTRACT

In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.


Subject(s)
Macular Degeneration , Models, Genetic , Humans , Phenotype , Macular Degeneration/genetics , Computer Simulation , Linear Models
4.
Genet Epidemiol ; 46(5-6): 234-255, 2022 07.
Article in English | MEDLINE | ID: mdl-35438198

ABSTRACT

In this paper, we develop functional ordinal logistic regression (FOLR) models to perform gene-based analysis of ordinal traits. In the proposed FOLR models, genetic variant data are viewed as stochastic functions of physical positions and the genetic effects are treated as a function of physical positions. The FOLR models are built upon functional data analysis which can be revised to analyze the ordinal traits and high dimension genetic data. The proposed methods are capable of dealing with dense genotype data which is usually encountered in analyzing the next-generation sequencing data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Simulation studies show that the likelihood ratio test statistics of the FOLR models control type I errors well and have good power performance. The proposed methods achieve the goals of analyzing ordinal traits directly, reducing high dimensionality of dense genetic variants, being computationally manageable, facilitating model convergence, properly controlling type I errors, and maintaining high power levels. The FOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes are found to strongly associate with four ordinal traits.


Subject(s)
Genetic Testing , Models, Genetic , Computer Simulation , Genetic Variation , Genotype , Humans , Logistic Models , Phenotype
5.
J Comput Biol ; 29(8): 908-931, 2022 08.
Article in English | MEDLINE | ID: mdl-35451855

ABSTRACT

Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the signals identified by association analysis may not have specific pathological relevance to diseases so that a large fraction of disease-causing genetic variants is still hidden. Association is used to measure dependence between two variables or two sets of variables. GWAS test association between a disease and single-nucleotide polymorphisms (SNPs) (or other genetic variants) across the genome. Association analysis may detect superficial patterns between disease and genetic variants. Association signals provide limited information on the causal mechanism of diseases. The use of association analysis as a major analytical platform for genetic studies of complex diseases is a key issue that may hamper discovery of disease mechanisms, calling into the questions the ability of GWAS to identify loci-underlying diseases. It is time to move beyond association analysis toward techniques, which enables the discovery of the underlying causal genetic structures of complex diseases. To achieve this, we propose the concept of genome-wide causation studies (GWCS) as an alternative to GWAS and develop additive noise models (ANMs) for genetic causation analysis. Type 1 error rates and power of the ANMs in testing causation are presented. We conducted GWCS of schizophrenia. Both simulation and real data analysis show that the proportion of the overlapped association and causation signals is small. Thus, we anticipate that our analysis will stimulate serious discussion of the applicability of GWAS and GWCS.


Subject(s)
Genome-Wide Association Study , Schizophrenia , Computer Simulation , Genome , Genome-Wide Association Study/methods , Humans , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Schizophrenia/genetics
6.
Front Immunol ; 13: 838884, 2022.
Article in English | MEDLINE | ID: mdl-35401568

ABSTRACT

MicroRNAs (miRNAs) play crucial roles in regulating the transcriptome and development of rheumatoid arthritis (RA). Currently, a comprehensive map illustrating how miRNAs regulate transcripts, pathways, immune system differentiation, and their interactions with terminal cells such as fibroblast-like synoviocytes (FLS), immune-cells, osteoblasts, and osteoclasts are still laking. In this review, we summarize the roles of miRNAs in the susceptibility, pathogenesis, diagnosis, therapeutic intervention, and prognosis of RA. Numerous miRNAs are abnormally expressed in cells involved in RA and regulate target genes and pathways, including NF-κB, Fas-FasL, JAK-STAT, and mTOR pathways. We outline how functional genetic variants of miR-499 and miR-146a partly explain susceptibility to RA. By regulating gene expression, miRNAs affect T cell differentiation into diverse cell types, including Th17 and Treg cells, thus constituting promising gene therapy targets to modulate the immune system in RA. We summarize the diagnostic and prognostic potential of blood-circulating and cell-free miRNAs, highlighting the opportunity to combine these miRNAs with antibodies to cyclic citrullinated peptide (ACCP) to allow accurate diagnosis and prognosis, particularly for seronegative patients. Furthermore, we review the evidence implicating miRNAs as promising biomarkers of efficiency and response of, and resistance to, disease-modifying anti-rheumatic drugs and immunotherapy. Finally, we discuss the autotherapeutic effect of miRNA intervention as a step toward the development of miRNA-based anti-RA drugs. Collectively, the current evidence supports miRNAs as interesting targets to better understand the pathogenetic mechanisms of RA and design more efficient therapeutic interventions.


Subject(s)
Arthritis, Rheumatoid , MicroRNAs , Synoviocytes , Arthritis, Rheumatoid/etiology , Arthritis, Rheumatoid/genetics , Biomarkers/metabolism , Epigenesis, Genetic , Humans , MicroRNAs/metabolism , Synoviocytes/metabolism
7.
Front Med (Lausanne) ; 8: 591372, 2021.
Article in English | MEDLINE | ID: mdl-34249953

ABSTRACT

Background: Novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), is now sweeping across the world. A substantial proportion of infections only lead to mild symptoms or are asymptomatic, but the proportion and infectivity of asymptomatic infections remains unknown. In this paper, we proposed a model to estimate the proportion and infectivity of asymptomatic cases, using COVID-19 in Henan Province, China, as an example. Methods: We extended the conventional susceptible-exposed-infectious-recovered model by including asymptomatic, unconfirmed symptomatic, and quarantined cases. Based on this model, we used daily reported COVID-19 cases from January 21 to February 26, 2020, in Henan Province to estimate the proportion and infectivity of asymptomatic cases, as well as the change of effective reproductive number, R t . Results: The proportion of asymptomatic cases among COVID-19 infected individuals was 42% and the infectivity was 10% that of symptomatic ones. The basic reproductive number R 0 = 2.73, and R t dropped below 1 on January 31 under a series of measures. Conclusion: The spread of the COVID-19 epidemic was rapid in the early stage, with a large number of asymptomatic infected individuals having relatively low infectivity. However, it was quickly brought under control with national measures.

8.
J Am Stat Assoc ; 116(534): 531-545, 2021.
Article in English | MEDLINE | ID: mdl-34321704

ABSTRACT

Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

9.
Genet Epidemiol ; 45(5): 455-470, 2021 07.
Article in English | MEDLINE | ID: mdl-33645812

ABSTRACT

Genetic studies of two related survival outcomes of a pleiotropic gene are commonly encountered but statistical models to analyze them are rarely developed. To analyze sequencing data, we propose mixed effect Cox proportional hazard models by functional regressions to perform gene-based joint association analysis of two survival traits motivated by our ongoing real studies. These models extend fixed effect Cox models of univariate survival traits by incorporating variations and correlation of multivariate survival traits into the models. The associations between genetic variants and two survival traits are tested by likelihood ratio test statistics. Extensive simulation studies suggest that type I error rates are well controlled and power performances are stable. The proposed models are applied to analyze bivariate survival traits of left and right eyes in the age-related macular degeneration progression.


Subject(s)
Eye Diseases , Genetic Variation , Eye Diseases/genetics , Genetic Association Studies , Humans , Models, Genetic , Phenotype
10.
Front Genet ; 11: 585804, 2020.
Article in English | MEDLINE | ID: mdl-33362849

ABSTRACT

Treatment response is heterogeneous. However, the classical methods treat the treatment response as homogeneous and estimate the average treatment effects. The traditional methods are difficult to apply to precision oncology. Artificial intelligence (AI) is a powerful tool for precision oncology. It can accurately estimate the individualized treatment effects and learn optimal treatment choices. Therefore, the AI approach can substantially improve progress and treatment outcomes of patients. One AI approach, conditional generative adversarial nets for inference of individualized treatment effects (GANITE) has been developed. However, GANITE can only deal with binary treatment and does not provide a tool for optimal treatment selection. To overcome these limitations, we modify conditional generative adversarial networks (MCGANs) to allow estimation of individualized effects of any types of treatments including binary, categorical and continuous treatments. We propose to use sparse techniques for selection of biomarkers that predict the best treatment for each patient. Simulations show that MCGANs outperform seven other state-of-the-art methods: linear regression (LR), Bayesian linear ridge regression (BLR), k-Nearest Neighbor (KNN), random forest classification [RF (C)], random forest regression [RF (R)], logistic regression (LogR), and support vector machine (SVM). To illustrate their applications, the proposed MCGANs were applied to 256 patients with newly diagnosed acute myeloid leukemia (AML) who were treated with high dose ara-C (HDAC), Idarubicin (IDA) and both of these two treatments (HDAC+IDA) at M. D. Anderson Cancer Center. Our results showed that MCGAN can more accurately and robustly estimate the individualized treatment effects than other state-of-the art methods. Several biomarkers such as GSK3, BILIRUBIN, SMAC are identified and a total of 30 biomarkers can explain 36.8% of treatment effect variation.

11.
Sci Rep ; 10(1): 4107, 2020 03 05.
Article in English | MEDLINE | ID: mdl-32139775

ABSTRACT

Although Alzheimer's disease (AD) is a central nervous system disease and type 2 diabetes MELLITUS (T2DM) is a metabolic disorder, an increasing number of genetic epidemiological studies show clear link between AD and T2DM. The current approach to uncovering the shared pathways between AD and T2DM involves association analysis; however such analyses lack power to discover the mechanisms of the diseases. As an alternative, we developed novel causal inference methods for genetic studies of AD and T2DM and pipelines for systematic multi-omic casual analysis to infer multilevel omics causal networks for the discovery of common paths from genetic variants to AD and T2DM. The proposed pipelines were applied to 448 individuals from the ROSMAP Project. We identified 13 shared causal genes, 16 shared causal pathways between AD and T2DM, and 754 gene expression and 101 gene methylation nodes that were connected to both AD and T2DM in multi-omics causal networks.


Subject(s)
Alzheimer Disease/etiology , Diabetes Mellitus, Type 2/etiology , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Bile Acids and Salts/biosynthesis , CREB-Binding Protein/metabolism , Causality , Computer Simulation , DNA Methylation , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Dopaminergic Neurons/metabolism , Fatty Acids/biosynthesis , Genetic Association Studies , Homeodomain Proteins/metabolism , Humans , Kinesins/metabolism , Mitogen-Activated Protein Kinase Kinases/metabolism , POU Domain Factors/metabolism , Signal Transduction
12.
Front Artif Intell ; 3: 41, 2020.
Article in English | MEDLINE | ID: mdl-33733158

ABSTRACT

As the Covid-19 pandemic surges around the world, questions arise about the number of global cases at the pandemic's peak, the length of the pandemic before receding, and the timing of intervention strategies to significantly stop the spread of Covid-19. We have developed artificial intelligence (AI)-inspired methods for modeling the transmission dynamics of the epidemics and evaluating interventions to curb the spread and impact of COVID-19. The developed methods were applied to the surveillance data of cumulative and new COVID-19 cases and deaths reported by WHO as of March 16th, 2020. Both the timing and the degree of intervention were evaluated. The average error of five-step ahead forecasting was 2.5%. The total peak number of cumulative cases, new cases, and the maximum number of cumulative cases in the world with complete intervention implemented 4 weeks later than the beginning date (March 16th, 2020) reached 75,249,909, 10,086,085, and 255,392,154, respectively. However, the total peak number of cumulative cases, new cases, and the maximum number of cumulative cases in the world with complete intervention after 1 week were reduced to 951,799, 108,853 and 1,530,276, respectively. Duration time of the COVID-19 spread was reduced from 356 days to 232 days between later and earlier interventions. We observed that delaying intervention for 1 month caused the maximum number of cumulative cases reduce by -166.89 times that of earlier complete intervention, and the number of deaths increased from 53,560 to 8,938,725. Earlier and complete intervention is necessary to stem the tide of COVID-19 infection.

13.
Front Neurosci ; 13: 1198, 2019.
Article in English | MEDLINE | ID: mdl-31802999

ABSTRACT

Deep convolutional neural networks (DCNNs) have achieved great success for image classification in medical research. Deep learning with brain imaging is the imaging method of choice for the diagnosis and prediction of Alzheimer's disease (AD). However, it is also well known that DCNNs are "black boxes" owing to their low interpretability to humans. The lack of transparency of deep learning compromises its application to the prediction and mechanism investigation in AD. To overcome this limitation, we develop a novel general framework that integrates deep leaning, feature selection, causal inference, and genetic-imaging data analysis for predicting and understanding AD. The proposed algorithm not only improves the prediction accuracy but also identifies the brain regions underlying the development of AD and causal paths from genetic variants to AD via image mediation. The proposed algorithm is applied to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset with diffusion tensor imaging (DTI) in 151 subjects (51 AD and 100 non-AD) who were measured at four time points of baseline, 6 months, 12 months, and 24 months. The algorithm identified brain regions underlying AD consisting of the temporal lobes (including the hippocampus) and the ventricular system.

14.
Genet Epidemiol ; 43(8): 952-965, 2019 12.
Article in English | MEDLINE | ID: mdl-31502722

ABSTRACT

The importance to integrate survival analysis into genetics and genomics is widely recognized, but only a small number of statisticians have produced relevant work toward this study direction. For unrelated population data, functional regression (FR) models have been developed to test for association between a quantitative/dichotomous/survival trait and genetic variants in a gene region. In major gene association analysis, these models have higher power than sequence kernel association tests. In this paper, we extend this approach to analyze censored traits for family data or related samples using FR based mixed effect Cox models (FamCoxME). The FamCoxME model effect of major gene as fixed mean via functional data analysis techniques, the local gene or polygene variations or both as random, and the correlation of pedigree members by kinship coefficients or genetic relationship matrix or both. The association between the censored trait and the major gene is tested by likelihood ratio tests (FamCoxME FR LRT). Simulation results indicate that the LRT control the type I error rates accurately/conservatively and have good power levels when both local gene or polygene variations are modeled. The proposed methods were applied to analyze a breast cancer data set from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). The FamCoxME provides a new tool for gene-based analysis of family-based studies or related samples.


Subject(s)
Genetic Association Studies , Models, Genetic , Survival Analysis , Computer Simulation , Genetic Variation , Humans , Pedigree , Phenotype , Proportional Hazards Models , Regression Analysis
15.
Sci Rep ; 9(1): 12717, 2019 09 03.
Article in English | MEDLINE | ID: mdl-31481703

ABSTRACT

Recent studies imply that rare variants contribute to the risk of schizophrenia, however, the exact variants or genes responsible for this condition are largely unknown. In this study, we conducted whole genome sequencing (WGS) of 20 Chinese families. Each family consisted of at least two affected siblings diagnosed with schizophrenia and at least one unaffected sibling. We examined functional variants that were found in affected sibling(s) but not in unaffected sibling(s) within a family. Matching this criterion, a frameshift heterozygous deletion of CA (-/CA) at chromosome 18:24722722, also referred to as rs752084147, in the Carbohydrate Sulfotransferase 9 (CHST9) gene, was detected in two families. This deletion was confirmed by PCR-based Sanger sequencing. With the observed frequency of 0.00076 in Han Chinese population, we performed both case-control and family-based analyses to evaluate its association with schizophrenia. In the case-control analyses, Chi-square test P-value was 6.80e-12 and the P-value was 0.0008 after one million simulations. In family-based segregation analyses, segregation P-value was 7.72e-7 and simulated P-value was 5.70e-6. For both the case-control and family-based analyses, the CA deletion was significantly associated with schizophrenia in the Chinese population. Further investigation of this gene  is warranted in the development of schizophrenia by utilizing larger and more ethnically diverse samples.


Subject(s)
Asian/genetics , Chromosomes, Human, Pair 18/genetics , Family , Frameshift Mutation , Polymorphism, Single Nucleotide , Schizophrenia , Sulfotransferases/genetics , Female , Humans , Male , Schizophrenia/ethnology , Schizophrenia/genetics , Whole Genome Sequencing
16.
J Invest Dermatol ; 139(11): 2352-2358.e3, 2019 11.
Article in English | MEDLINE | ID: mdl-31176707

ABSTRACT

To investigate the role of tumor cytokines/chemokines in melanoma immune response, we estimated the proportions of immune cell subsets in melanoma tumors from The Cancer Genome Atlas, followed by evaluation of the association between cytokine/chemokine expression and these subsets. We then investigated the association of immune cell subsets, chemokines, and cytokines with patient survival. Finally, we evaluated the immune cell tumor-infiltrating lymphocyte (TIL) score for correlation with melanoma patient outcome in a separate cohort. There was good agreement between RNA sequencing estimation of T-cell subset and pathologist-determined TIL score. Expression levels of cytokines IL-12A, IFNG, and IL-10, and chemokines CXCL9 and CXCL10 were positively correlated with PDCD1, CTLA-4, and CD8+ T-cell subset, but negatively correlated with tumor purity (Bonferroni-corrected P < 0.05). In multivariable analysis, higher expression levels of cytokines IFN-γ and TGFB1, but not chemokines, were associated with improved overall survival. A higher expression level of CD8+ T-cell subset was also associated with improved overall survival (hazard ratio [HR] = 0.06, 95% confidence interval [CI] = 0.01-0.35, P = 0.002). Finally, multivariable analysis showed that patients with a brisk TIL score had improved melanoma-specific survival than those with a nonbrisk score (HR = 0.51, 95% CI = 0.27-0.98, P = 0.0423). These results suggest that the expression of specific tumor cytokines represents important biomarkers of melanoma immune response.


Subject(s)
CD8-Positive T-Lymphocytes/immunology , Chemokines/metabolism , Cytokines/metabolism , Inflammation/immunology , Lymphocytes, Tumor-Infiltrating/immunology , Melanoma/immunology , Skin Neoplasms/immunology , Biomarkers, Tumor/metabolism , Case-Control Studies , Chemokines/genetics , Cohort Studies , Cytokines/genetics , Female , Humans , Immunity, Cellular , Male , Melanoma/mortality , Neoplasm Staging , Prognosis , Skin Neoplasms/mortality , Survival Analysis
17.
Front Genet ; 10: 319, 2019.
Article in English | MEDLINE | ID: mdl-31024629

ABSTRACT

Genome-wide association studies (GWASs) have identified abundant genetic susceptibility loci, GWAS of small sample size are far less from meeting the previous expectations due to low statistical power and false positive results. Effective statistical methods are required to further improve the analyses of massive GWAS data. Here we presented a new statistic (Robust Reference Powered Association Test) to use large public database (gnomad) as reference to reduce concern of potential population stratification. To evaluate the performance of this statistic for various situations, we simulated multiple sets of sample size and frequencies to compute statistical power. Furthermore, we applied our method to several real datasets (psoriasis genome-wide association datasets and schizophrenia genome-wide association dataset) to evaluate the performance. Careful analyses indicated that our newly developed statistic outperformed several previously developed GWAS applications. Importantly, this statistic is more robust than naive merging method in the presence of small control-reference differentiation, therefore likely to detect more association signals.

18.
Genet Epidemiol ; 43(2): 189-206, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30537345

ABSTRACT

We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. F -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the F -distributed statistics provide a good control of the type I error rate. The F -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The F -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels α = 0.01 and 0.05 . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.


Subject(s)
Genetic Association Studies , High-Throughput Nucleotide Sequencing/methods , Models, Genetic , Quantitative Trait, Heritable , Computer Simulation , Family , Humans , Linear Models , Myopia/genetics
19.
BMC Bioinformatics ; 19(1): 448, 2018 Nov 22.
Article in English | MEDLINE | ID: mdl-30466390

ABSTRACT

BACKGROUND: Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). RESULTS: We addressed this problem by using knnAUC (k-nearest neighbors AUC test, the R package is available at https://sourceforge.net/projects/knnauc/ ). In the knnAUC software framework, we first resampled a dataset to get the training and testing dataset according to the sample ratio (from 0 to 1), and then constructed a k-nearest neighbors algorithm classifier to get the yhat estimator (the probability of y = 1) of testy (the true label of testing dataset). Finally, we calculated the AUC (area under the curve of receiver operating characteristic) estimator and tested whether the AUC estimator is greater than 0.5. To evaluate the advantages of knnAUC compared to seven other popular methods, we performed extensive simulations to explore the relationships between eight different methods and compared the false positive rates and statistical power using both simulated and real datasets (Chronic hepatitis B datasets and kidney cancer RNA-seq datasets). CONCLUSIONS: We concluded that knnAUC is an efficient R package to test non-linear dependence between one continuous variable and one binary dependent variable especially in computational biology area.


Subject(s)
Sequence Analysis, RNA/methods , Cluster Analysis , Computational Biology/methods , Humans
20.
Front Genet ; 9: 347, 2018.
Article in English | MEDLINE | ID: mdl-30233639

ABSTRACT

The mainstream of research in genetics, epigenetics, and imaging data analysis focuses on statistical association or exploring statistical dependence between variables. Despite their significant progresses in genetic research, understanding the etiology and mechanism of complex phenotypes remains elusive. Using association analysis as a major analytical platform for the complex data analysis is a key issue that hampers the theoretic development of genomic science and its application in practice. Causal inference is an essential component for the discovery of mechanical relationships among complex phenotypes. Many researchers suggest making the transition from association to causation. Despite its fundamental role in science, engineering, and biomedicine, the traditional methods for causal inference require at least three variables. However, quantitative genetic analysis such as QTL, eQTL, mQTL, and genomic-imaging data analysis requires exploring the causal relationships between two variables. This paper will focus on bivariate causal discovery with continuous variables. We will introduce independence of cause and mechanism (ICM) as a basic principle for causal inference, algorithmic information theory and additive noise model (ANM) as major tools for bivariate causal discovery. Large-scale simulations will be performed to evaluate the feasibility of the ANM for bivariate causal discovery. To further evaluate their performance for causal inference, the ANM will be applied to the construction of gene regulatory networks. Also, the ANM will be applied to trait-imaging data analysis to illustrate three scenarios: presence of both causation and association, presence of association while absence of causation, and presence of causation, while lack of association between two variables. Telling cause from effect between two continuous variables from observational data is one of the fundamental and challenging problems in omics and imaging data analysis. Our preliminary simulations and real data analysis will show that the ANMs will be one of choice for bivariate causal discovery in genomic and imaging data analysis.

SELECTION OF CITATIONS
SEARCH DETAIL
...