Search | VHL Regional Portal

Cancer subtyping with heterogeneous multi-omics data via hierarchical multi-kernel learning.

Wei, Yifang; Li, Lingmei; Zhao, Xin; Yang, Haitao; Sa, Jian; Cao, Hongyan; Cui, Yuehua.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36433785

ABSTRACT

Differentiating cancer subtypes is crucial to guide personalized treatment and improve the prognosis for patients. Integrating multi-omics data can offer a comprehensive landscape of cancer biological process and provide promising ways for cancer diagnosis and treatment. Taking the heterogeneity of different omics data types into account, we propose a hierarchical multi-kernel learning (hMKL) approach, a novel cancer molecular subtyping method to identify cancer subtypes by adopting a two-stage kernel learning strategy. In stage 1, we obtain a composite kernel borrowing the cancer integration via multi-kernel learning (CIMLR) idea by optimizing the kernel parameters for individual omics data type. In stage 2, we obtain a final fused kernel through a weighted linear combination of individual kernels learned from stage 1 using an unsupervised multiple kernel learning method. Based on the final fusion kernel, k-means clustering is applied to identify cancer subtypes. Simulation studies show that hMKL outperforms the one-stage CIMLR method when there is data heterogeneity. hMKL can estimate the number of clusters correctly, which is the key challenge in subtyping. Application to two real data sets shows that hMKL identified meaningful subtypes and key cancer-associated biomarkers. The proposed method provides a novel toolkit for heterogeneous multi-omics data integration and cancer subtypes identification.

Subject(s)

Deep Learning , Neoplasms , Humans , Multiomics , Neoplasms/genetics , Cluster Analysis , Computer Simulation , Biomarkers, Tumor/genetics

Identifying complex gene-gene interactions: a mixed kernel omnibus testing approach.

Liu, Yan; Gao, Yuzhao; Fang, Ruiling; Cao, Hongyan; Sa, Jian; Wang, Jianrong; Liu, Hongqi; Wang, Tong; Cui, Yuehua.

Brief Bioinform ; 22(6)2021 11 05.

Article in English | MEDLINE | ID: mdl-34373892

ABSTRACT

Genes do not function independently; rather, they interact with each other to fulfill their joint tasks. Identification of gene-gene interactions has been critically important in elucidating the molecular mechanisms responsible for the variation of a phenotype. Regression models are commonly used to model the interaction between two genes with a linear product term. The interaction effect of two genes can be linear or nonlinear, depending on the true nature of the data. When nonlinear interactions exist, the linear interaction model may not be able to detect such interactions; hence, it suffers from substantial power loss. While the true interaction mechanism (linear or nonlinear) is generally unknown in practice, it is critical to develop statistical methods that can be flexible to capture the underlying interaction mechanism without assuming a specific model assumption. In this study, we develop a mixed kernel function which combines both linear and Gaussian kernels with different weights to capture the linear or nonlinear interaction of two genes. Instead of optimizing the weight function, we propose a grid search strategy and use a Cauchy transformation of the P-values obtained under different weights to aggregate the P-values. We further extend the two-gene interaction model to a high-dimensional setup using a de-biased LASSO algorithm. Extensive simulation studies are conducted to verify the performance of the proposed method. Application to two case studies further demonstrates the utility of the model. Our method provides a flexible and computationally efficient tool for disentangling complex gene-gene interactions associated with complex traits.

Subject(s)

Computer Simulation , Epistasis, Genetic , Algorithms , Humans , Phenotype

Time-Varying Gene Network Analysis of Human Prefrontal Cortex Development.

Wang, Huihui; Wu, Yongqing; Fang, Ruiling; Sa, Jian; Li, Zhi; Cao, Hongyan; Cui, Yuehua.

Front Genet ; 11: 574543, 2020.

Article in English | MEDLINE | ID: mdl-33304381

ABSTRACT

The prefrontal cortex (PFC) constitutes a large part of the human central nervous system and is essential for the normal social affection and executive function of humans and other primates. Despite ongoing research in this region, the development of interactions between PFC genes over the lifespan is still unknown. To investigate the conversion of PFC gene interaction networks and further identify hub genes, we obtained time-series gene expression data of human PFC tissues from the Gene Expression Omnibus (GEO) database. A statistical model, loggle, was used to construct time-varying networks and several common network attributes were used to explore the development of PFC gene networks with age. Network similarity analysis showed that the development of human PFC is divided into three stages, namely, fast development period, deceleration to stationary period, and recession period. We identified some genes related to PFC development at these different stages, including genes involved in neuronal differentiation or synapse formation, genes involved in nerve impulse transmission, and genes involved in the development of myelin around neurons. Some of these genes are consistent with findings in previous reports. At the same time, we explored the development of several known KEGG pathways in PFC and corresponding hub genes. This study clarified the development trajectory of the interaction between PFC genes, and proposed a set of candidate genes related to PFC development, which helps further study of human brain development at the genomic level supplemental to regular anatomical analyses. The analytical process used in this study, involving the loggle model, similarity analysis, and central analysis, provides a comprehensive strategy to gain novel insights into the evolution and development of brain networks in other organisms.

Association between N-terminal proB-type Natriuretic Peptide and Depressive Symptoms in Patients with Acute Myocardial Infarction.

Ren, Yan; Jia, Jiao; Sa, Jian; Qiu, Li-Xia; Cui, Yue-Hua; Zhang, Yue-An; Yang, Hong; Liu, Gui-Fen.

Chin Med J (Engl) ; 130(5): 542-548, 2017 03 05.

Article in English | MEDLINE | ID: mdl-28229985

ABSTRACT

BACKGROUND: While depression and certain cardiac biomarkers are associated with acute myocardial infarction (AMI), the relationship between them remains largely unexplored. We examined the association between depressive symptoms and biomarkers in patients with AMI. METHODS: We performed a cross-sectional study using data from 103 patients with AMI between March 2013 and September 2014. The levels of depression, N-terminal proB-type natriuretic peptide (NT-proBNP), and troponin I (TnI) were measured at baseline. The patients were divided into two groups: those with depressive symptoms and those without depressive symptoms according to Zung Self-rating Depression Scale (SDS) score. Baseline comparisons between two groups were made using Student's t-test for continuous variables, Chi-square or Fisher's exact test for categorical variables, and Wilcoxon test for variables in skewed distribution. Binomial logistic regression and multivariate linear regression were performed to assess the association between depressive symptoms and biomarkers while adjusting for demographic and clinical variables. RESULTS: Patients with depressive symptoms had significantly higher NT-proBNP levels as compared to patients without depressive symptoms (1135.0 [131.5, 2474.0] vs. 384.0 [133.0, 990.0], Z = -2.470, P = 0.013). Depressive symptoms were associated with higher NT-proBNP levels (odds ratio [OR] = 2.348, 95% CI: 1.344 to 4.103, P = 0.003) and higher body mass index (OR = 1.169, 95% confidence interval [CI]: 1.016 to 1.345, P = 0.029). The total SDS score was associated with the NT-proBNP level (ß= 0.327, 95% CI: 1.674 to 6.119, P = 0.001) after multivariable adjustment. In particular, NT-proBNP was associated with three of the depressive dimensions, including core depression (ß = 0.299, 95% CI: 0.551 to 2.428, P = 0.002), cognitive depression (ß = 0.320, 95% CI: 0.476 to 1.811, P = 0.001), and somatic depression (ß = 0.333, 95% CI: 0.240 to 0.847, P = 0.001). Neither the overall depressive symptomatology nor the individual depressive dimensions were associated with TnI levels. CONCLUSIONS: Depressive symptoms, especially core depression, cognitive depression, and somatic depression, were related to high NT-proBNP levels in patients with AMI.

Subject(s)

Depressive Disorder/diagnosis , Myocardial Infarction/metabolism , Myocardial Infarction/psychology , Natriuretic Peptide, Brain/metabolism , Peptide Fragments/metabolism , Aged , Biomarkers/metabolism , Cross-Sectional Studies , Depressive Disorder/etiology , Depressive Disorder/metabolism , Female , Humans , Male , Middle Aged , Troponin I/metabolism

A Nonlinear Model for Gene-Based Gene-Environment Interaction.

Sa, Jian; Liu, Xu; He, Tao; Liu, Guifen; Cui, Yuehua.

Int J Mol Sci ; 17(6)2016 Jun 04.

Article in English | MEDLINE | ID: mdl-27271617

ABSTRACT

A vast amount of literature has confirmed the role of gene-environment (G×E) interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP) and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects) are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR) model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC) model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR) model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.

Subject(s)

Gene-Environment Interaction , Models, Biological , Nonlinear Dynamics , Algorithms , Animals , Computational Biology/methods , Computer Simulation , Databases, Genetic , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide

Statistical dissection of cyto-nuclear epistasis subject to genomic imprinting in line crosses.

He, Tao; Sa, Jian; Zhong, Ping-Shou; Cui, Yuehua.

PLoS One ; 9(3): e91702, 2014.

Article in English | MEDLINE | ID: mdl-24643065

ABSTRACT

Cytoplasm contains important metabolism reaction organelles such as mitochondria and chloroplast (in plant). In particular, mitochondria contains special DNA information which can be passed to offsprings through maternal gametes, and has been confirmed to play a pivotal role in nuclear activities. Experimental evidences have documented the importance of cyto-nuclear interactions in affecting important biological traits. While studies have also pointed out the role of interaction between imprinting nuclear DNA and cytoplasm, no statistical method has been developed to efficiently model such effect and further quantify its effect size. In this work, we developed an efficient statistical model for genome-wide estimating and testing the cytoplasmic effect, nuclear DNA imprinting effect as well as the interaction between them under reciprocal backcross and F2 designs derived from inbred lines. Parameters are estimated under maximum likelihood framework implemented with the EM algorithm. Extensive simulations show good performance in a variety of scenarios. The utility of the method is demonstrated by analyzing a published data set in an F2 family derived from C3H/HeJBir and C57BL/6 J mouse strains. Important cyto-nuclear interactions were identified. Our approach provides a quantitative framework for identifying and estimating cyto-nuclear interactions subject to genomic imprinting involved in the genetic control of complex traits.

Subject(s)

Cell Nucleus/genetics , Cytoplasm/genetics , Epistasis, Genetic , Genomic Imprinting , Models, Genetic , Algorithms , Animals , Computer Simulation , Crosses, Genetic , Female , Male , Mice , Mice, Inbred C3H , Mice, Inbred C57BL , Quantitative Trait Loci

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL