Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Br J Cancer ; 130(6): 1001-1012, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38278975

ABSTRACT

BACKGROUND: Cancer is a heterogeneous disease driven by complex molecular alterations. Cancer subtypes determined from multi-omics data can provide novel insight into personalised precision treatment. It is recognised that incorporating prior weight knowledge into multi-omics data integration can improve disease subtyping. METHODS: We develop a weighted method, termed weight-boosted Multi-Kernel Learning (wMKL) which incorporates heterogeneous data types as well as flexible weight functions, to boost subtype identification. Given a series of weight functions, we propose an omnibus combination strategy to integrate different weight-related P-values to improve subtyping precision. RESULTS: wMKL models each data type with multiple kernel choices, thus alleviating the sensitivity and robustness issue due to selecting kernel parameters. Furthermore, wMKL integrates different data types by learning weights of different kernels derived from each data type, recognising the heterogeneous contribution of different data types to the final subtyping performance. The proposed wMKL outperforms existing weighted and non-weighted methods. The utility and advantage of wMKL are illustrated through extensive simulations and applications to two TCGA datasets. Novel subtypes are identified followed by extensive downstream bioinformatics analysis to understand the molecular mechanisms differentiating different subtypes. CONCLUSIONS: The proposed wMKL method provides a novel strategy for disease subtyping. The wMKL is freely available at https://github.com/biostatcao/wMKL .


Subject(s)
Multiomics , Neoplasms , Humans , Computational Biology/methods , Neoplasms/genetics
2.
Clin Chim Acta ; 544: 117362, 2023 Apr 01.
Article in English | MEDLINE | ID: mdl-37088117

ABSTRACT

BACKGROUND: GDM is always treated as a homogenous disease ignoring the different metabolic characteristics in oral glucose tolerance test (OGTT). We assessed the effect of GDM on macrosomia based on the different characteristics of OGTT. METHODS: We retrospectively divided 998 GDM pregnant women into 7 groups, Group A1: abnormal OGTT0h; Group A2: abnormal OGTT1h; Group A3: abnormal OGTT2h; Group B1: abnormal OGTT0h+1h; Group B2: abnormal OGTT0h+2h; Group B3: abnormal OGTT1h+2h; Group C: abnormal OGTT0h+1h+2h. RESULTS: The incidence of macrosomia in group C (21.92%) was higher than other groups. The OR of OGTT0h+1h+2h was significant (OGTT1h: OR = 1.577, 95% CI: 0.791, 3.145; OGTT2h: OR = 1.151, 95% CI: 0.572, 2.313; OGTT0h+1h: OR = 1.346, 95% CI: 0.584, 3.101; OGTT0h+2h: OR = 1.327, 95% CI: 0.517, 3.409; OGTT1h+2h: OR = 0.771, 95% CI: 0.256, 2.322; OGTT0h+1h+2h: OR = 4.164, 95% CI: 2.095, 8.278) when comparing with OGTT0h. Subgroup analysis showed abnormal OGTT0h+1h+2h might contribute more to macrosomia in pre-pregnancy BMI ≥ 24 kg/m2 than those with BMI < 24 kg/m2. CONCLUSION: The effect of abnormal OGTT0h+1h+2h on macrosomia was significantly greater than other OGTT characteristics, especially for those with pre-pregnancy BMI ≥ 24 kg/m2. Individualized management of GDM based on OGTT characteristics and pre-pregnancy BMI might be needed.


Subject(s)
Diabetes, Gestational , Fetal Macrosomia , Fetal Macrosomia/diagnosis , Fetal Macrosomia/etiology , Glucose Tolerance Test , Diabetes, Gestational/metabolism , Humans , Female , Pregnancy , Adolescent , Young Adult , Adult , Blood Glucose/analysis , Blood Glucose/metabolism , Retrospective Studies
3.
Comput Struct Biotechnol J ; 20: 3482-3492, 2022.
Article in English | MEDLINE | ID: mdl-35860412

ABSTRACT

Lower-grade gliomas (LGG), characterized by heterogeneity and invasiveness, originate from the central nervous system. Although studies focusing on molecular subtyping and molecular characteristics have provided novel insights into improving the diagnosis and therapy of LGG, there is an urgent need to identify new molecular subtypes and biomarkers that are promising to improve patient survival outcomes. Here, we proposed a joint similarity network fusion (Joint-SNF) method to integrate different omics data types to construct a fused network using the Joint and Individual Variation Explained (JIVE) technique under the SNF framework. Focusing on the joint network structure, a spectral clustering method was employed to obtain subtypes of patients. Simulation studies show that the proposed Joint-SNF method outperforms the original SNF approach under various simulation scenarios. We further applied the method to a Chinese LGG data set including mRNA expression, DNA methylation and microRNA (miRNA). Three molecular subtypes were identified and showed statistically significant differences in patient survival outcomes. The five-year mortality rates of the three subtypes are 80.8%, 32.1%, and 34.4%, respectively. After adjusting for clinically relevant covariates, the death risk of patients in Cluster 1 was 5.06 times higher than patients in other clusters. The fused network attained by the proposed Joint-SNF method enhances strong similarities, thus greatly improves subtyping performance compared to the original SNF method. The findings in the real application may provide important clues for improving patient survival outcomes and for precision treatment for Chinese LGG patients. An R package to implement the method can be accessed in Github at https://github.com/Sameerer/Joint-SNF.

4.
Front Cell Infect Microbiol ; 11: 708088, 2021.
Article in English | MEDLINE | ID: mdl-34692558

ABSTRACT

Comprehensive analyses of multi-omics data may provide insights into interactions between different biological layers concerning distinct clinical features. We integrated data on the gut microbiota, blood parameters and urine metabolites of treatment-naive individuals presenting a wide range of metabolic disease phenotypes to delineate clinically meaningful associations. Trans-omics correlation networks revealed that candidate gut microbial biomarkers and urine metabolite feature were covaried with distinct clinical phenotypes. Integration of the gut microbiome, the urine metabolome and the phenome revealed that variations in one of these three systems correlated with changes in the other two. In a specific note about clinical parameters of liver function, we identified Eubacteriumeligens, Faecalibacteriumprausnitzii and Ruminococcuslactaris to be associated with a healthy liver function, whereas Clostridium bolteae, Tyzzerellanexills, Ruminococcusgnavus, Blautiahansenii, and Atopobiumparvulum were associated with blood biomarkers for liver diseases. Variations in these microbiota features paralleled changes in specific urine metabolites. Network modeling yielded two core clusters including one large gut microbe-urine metabolite close-knit cluster and one triangular cluster composed of a gut microbe-blood-urine network, demonstrating close inter-system crosstalk especially between the gut microbiome and the urine metabolome. Distinct clinical phenotypes are manifested in both the gut microbiome and the urine metabolome, and inter-domain connectivity takes the form of high-dimensional networks. Such networks may further our understanding of complex biological systems, and may provide a basis for identifying biomarkers for diseases. Deciphering the complexity of human physiology and disease requires a holistic and trans-omics approach integrating multi-layer data sets, including the gut microbiome and profiles of biological fluids. By studying the gut microbiome on carotid atherosclerosis, we identified microbial features associated with clinical parameters, and we observed that groups of urine metabolites correlated with groups of clinical parameters. Combining the three data sets, we revealed correlations of entities across the three systems, suggesting that physiological changes are reflected in each of the omics. Our findings provided insights into the interactive network between the gut microbiome, blood clinical parameters and the urine metabolome concerning physiological variations, and showed the promise of trans-omics study for biomarker discovery.


Subject(s)
Carotid Artery Diseases , Gastrointestinal Microbiome , Biomarkers , Clostridiales , Humans , Metabolome , Metabolomics
5.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34373892

ABSTRACT

Genes do not function independently; rather, they interact with each other to fulfill their joint tasks. Identification of gene-gene interactions has been critically important in elucidating the molecular mechanisms responsible for the variation of a phenotype. Regression models are commonly used to model the interaction between two genes with a linear product term. The interaction effect of two genes can be linear or nonlinear, depending on the true nature of the data. When nonlinear interactions exist, the linear interaction model may not be able to detect such interactions; hence, it suffers from substantial power loss. While the true interaction mechanism (linear or nonlinear) is generally unknown in practice, it is critical to develop statistical methods that can be flexible to capture the underlying interaction mechanism without assuming a specific model assumption. In this study, we develop a mixed kernel function which combines both linear and Gaussian kernels with different weights to capture the linear or nonlinear interaction of two genes. Instead of optimizing the weight function, we propose a grid search strategy and use a Cauchy transformation of the P-values obtained under different weights to aggregate the P-values. We further extend the two-gene interaction model to a high-dimensional setup using a de-biased LASSO algorithm. Extensive simulation studies are conducted to verify the performance of the proposed method. Application to two case studies further demonstrates the utility of the model. Our method provides a flexible and computationally efficient tool for disentangling complex gene-gene interactions associated with complex traits.


Subject(s)
Computer Simulation , Epistasis, Genetic , Algorithms , Humans , Phenotype
6.
Front Genet ; 12: 652315, 2021.
Article in English | MEDLINE | ID: mdl-33828587

ABSTRACT

Heart failure with preserved ejection fraction (HFpEF) has become a major health issue because of its high mortality, high heterogeneity, and poor prognosis. Using genomic data to classify patients into different risk groups is a promising method to facilitate the identification of high-risk groups for further precision treatment. Here, we applied six machine learning models, namely kernel partial least squares with the genetic algorithm (GA-KPLS), the least absolute shrinkage and selection operator (LASSO), random forest, ridge regression, support vector machine, and the conventional logistic regression model, to predict HFpEF risk and to identify subgroups at high risk of death based on gene expression data. The model performance was evaluated using various criteria. Our analysis was focused on 149 HFpEF patients from the Framingham Heart Study cohort who were classified into good-outcome and poor-outcome groups based on their 3-year survival outcome. The results showed that the GA-KPLS model exhibited the best performance in predicting patient risk. We further identified 116 differentially expressed genes (DEGs) between the two groups, thus providing novel therapeutic targets for HFpEF. Additionally, the DEGs were enriched in Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways related to HFpEF. The GA-KPLS-based HFpEF model is a powerful method for risk stratification of 3-year mortality in HFpEF patients.

7.
Comput Struct Biotechnol J ; 19: 1567-1578, 2021.
Article in English | MEDLINE | ID: mdl-33868594

ABSTRACT

Heart failure with preserved ejection fraction (HFpEF) is associated with multiple etiologic and pathophysiologic factors. HFpEF leads to significant cardiovascular morbidity and mortality. There are various reasons that fail to identify effective therapeutic interventions for HFpEF, primarily due to its clinical heterogeneity causing significant difficulties in determining physiologic and prognostic implications for this syndrome. Thus, identifying clinical subtypes using multi-omics data has great implications for efficient treatment and prognosis of HFpEF patients. Here we proposed to integrate mRNA, DNA methylation and microRNA (miRNA) expression data of HFpEF with a similarity network fusion (SNF) method following a network enhancement (ne-SNF) denoising technique to form a fused network. A spectral clustering method was then used to obtain clusters of patient subtypes. Experiments on HFpEF datasets demonstrated that ne-SNF significantly outperforms single data subtype analysis and other integrated methods. The identified subgroups were shown to have statistically significant differences in survival. Two HFpEF subtypes were defined: a high-risk group (16.8%) and a low-risk group (83.2%). The 5-year mortality rates were 63.3% and 33.0% for the high- and low-risk group, respectively. After adjusting for the effects of clinical covariates, HFpEF patients in the high-risk group were 2.43 times more likely to die than the low-risk group. A total of 157 differentially expressed (DE) mRNAs, 2199 abnormal methylations and 121 DE miRNAs were identified between two subtypes. They were also enriched in many HFpEF-related biological processes or pathways. The ne-SNF method provides a novel pipeline for subtype identification in integrated analysis of multi-omics data.

8.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32608480

ABSTRACT

Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studies with the goal to identify important DNA methylations that act as mediators between an exposure disease outcome. Specifically, we focus on gene-based high-dimensional mediation analysis implemented with kernel principal component analysis to capture potential nonlinear mediation effect. We first review the current high-dimensional mediation models and then propose two gene-based analytical approaches: gene-based high-dimensional mediation analysis based on linearity assumption between mediators and outcome (gHMA-L) and gene-based high-dimensional mediation analysis based on nonlinearity assumption (gHMA-NL). Since the underlying true mediation relationship is unknown in practice, we further propose an omnibus test of gene-based high-dimensional mediation analysis (gHMA-O) by combing gHMA-L and gHMA-NL. Extensive simulation studies show that gHMA-L performs better under the model linear assumption and gHMA-NL does better under the model nonlinear assumption, while gHMA-O is a more powerful and robust method by combining the two. We apply the proposed methods to two datasets to investigate genes whose methylation levels act as important mediators in the relationship: (1) between alcohol consumption and epithelial ovarian cancer risk using data from the Mayo Clinic Ovarian Cancer Case-Control Study and (2) between childhood maltreatment and comorbid post-traumatic stress disorder and depression in adulthood using data from the Gray Trauma Project.


Subject(s)
Computer Simulation , DNA Methylation , Epigenesis, Genetic , Models, Genetic , Adult , Alcohol Drinking/genetics , Child, Preschool , Depression/genetics , Female , Humans , Male , Mediation Analysis , Ovarian Neoplasms/genetics , Stress Disorders, Post-Traumatic/genetics
9.
Front Genet ; 11: 574543, 2020.
Article in English | MEDLINE | ID: mdl-33304381

ABSTRACT

The prefrontal cortex (PFC) constitutes a large part of the human central nervous system and is essential for the normal social affection and executive function of humans and other primates. Despite ongoing research in this region, the development of interactions between PFC genes over the lifespan is still unknown. To investigate the conversion of PFC gene interaction networks and further identify hub genes, we obtained time-series gene expression data of human PFC tissues from the Gene Expression Omnibus (GEO) database. A statistical model, loggle, was used to construct time-varying networks and several common network attributes were used to explore the development of PFC gene networks with age. Network similarity analysis showed that the development of human PFC is divided into three stages, namely, fast development period, deceleration to stationary period, and recession period. We identified some genes related to PFC development at these different stages, including genes involved in neuronal differentiation or synapse formation, genes involved in nerve impulse transmission, and genes involved in the development of myelin around neurons. Some of these genes are consistent with findings in previous reports. At the same time, we explored the development of several known KEGG pathways in PFC and corresponding hub genes. This study clarified the development trajectory of the interaction between PFC genes, and proposed a set of candidate genes related to PFC development, which helps further study of human brain development at the genomic level supplemental to regular anatomical analyses. The analytical process used in this study, involving the loggle model, similarity analysis, and central analysis, provides a comprehensive strategy to gain novel insights into the evolution and development of brain networks in other organisms.

10.
Front Genet ; 11: 437, 2020.
Article in English | MEDLINE | ID: mdl-32508874

ABSTRACT

Genome-wide association studies focusing on a single phenotype have been broadly conducted to identify genetic variants associated with a complex disease. The commonly applied single variant analysis is limited by failing to consider the complex interactions between variants, which motivated the development of association analyses focusing on genes or gene sets. Moreover, when multiple correlated phenotypes are available, methods based on a multi-trait analysis can improve the association power. However, most currently available multi-trait analyses are single variant-based analyses; thus have limited power when disease variants function as a group in a gene or a gene set. In this work, we propose a genome-wide gene-based multi-trait analysis method by considering genes as testing units. For a given phenotype, we adopt a rapid and powerful kernel-based testing method which can evaluate the joint effect of multiple variants within a gene. The joint effect, either linear or nonlinear, is captured through kernel functions. Given a series of candidate kernel functions, we propose an omnibus test strategy to integrate the test results based on different candidate kernels. A p-value combination method is then applied to integrate dependent p-values to assess the association between a gene and multiple correlated phenotypes. Simulation studies show a reasonable type I error control and an excellent power of the proposed method compared to its counterparts. We further show the utility of the method by applying it to two data sets: the Human Liver Cohort and the Alzheimer Disease Neuroimaging Initiative data set, and novel genes are identified. Our method has broad applications in other fields in which the interest is to evaluate the joint effect (linear or nonlinear) of a set of variants.

11.
Front Genet ; 10: 1195, 2019.
Article in English | MEDLINE | ID: mdl-31824577

ABSTRACT

Mediation analysis has been a powerful tool to identify factors mediating the association between exposure variables and outcomes. It has been applied to various genomic applications with the hope to gain novel insights into the underlying mechanism of various diseases. Given the high-dimensional nature of epigenetic data, recent effort on epigenetic mediation analysis is to first reduce the data dimension by applying high-dimensional variable selection techniques, then conducting testing in a low dimensional setup. In this paper, we propose to assess the mediation effect by adopting a high-dimensional testing procedure which can produce unbiased estimates of the regression coefficients and can properly handle correlations between variables. When the data dimension is ultra-high, we first reduce the data dimension from ultra-high to high by adopting a sure independence screening (SIS) method. We apply the method to two high-dimensional epigenetic studies: one is to assess how DNA methylations mediate the association between alcohol consumption and epithelial ovarian cancer (EOC) status; the other one is to assess how methylation signatures mediate the association between childhood maltreatment and post-traumatic stress disorder (PTSD) in adulthood. We compare the performance of the method with its counterpart via simulation studies. Our method can be applied to other high-dimensional mediation studies where high-dimensional mediation variables are collected.

SELECTION OF CITATIONS
SEARCH DETAIL
...