Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
Ecol Evol ; 14(5): e11342, 2024 May.
Article in English | MEDLINE | ID: mdl-38799395

ABSTRACT

The morphological variation in Schizothorax oconnori, Schizothorax waltoni, and their natural hybrids was examined using conventional and image-based analysis approaches. In total, 38 specimens of S. oconnori, 35 of S. waltoni, and 37 natural hybrids were collected from the Shigatse to the Lhasa section of the Yarlung Zangbo River during June and July 2021. A total of 21 morphometric, 4 meristic, and 27 truss variables were employed for the classification of S. oconnori, S. waltoni, and natural hybrids. Principal component analysis (PCA) and factor analysis (FA), as well as discriminant function analysis (DFA) and cluster analysis (CA), were conducted to identify differences based on traditional and truss measurements. Four principal components explained 75.92% of the variation among the morphometric characters, while five principal components accounted for 79.69% of the variation among the truss distances. FA results showed that factor 1 was associated with head shape, and factor 2 was associated with fins based on morphometric characters. Among the truss characters, factor 1 was related to head shape, and factor 2 was related to chest shape. In DFA, morphometric measurements achieved higher accuracy (100%) compared to truss distances (94.55%). The head morphology of hybrids exhibited intermediate traits between S. oconnori and S. waltoni. Both morphometry-based and truss-based clustering indicated that the morphology of natural hybrids leaned toward S. oconnori. In conclusion, the combination of morphometric and truss analysis is beneficial for classifying S. oconnori, S. waltoni, and their natural hybrids. The presence of natural hybrids could be considered an evolutionary response to the differentiation of nutritional and spatial niches in the middle Yarlung Zangbo River.

2.
Epigenomics ; 2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38477028

ABSTRACT

Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.

3.
Bioinformatics ; 39(10)2023 10 03.
Article in English | MEDLINE | ID: mdl-37851379

ABSTRACT

MOTIVATION: Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS: In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION: The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.


Subject(s)
Algorithms , Gene Regulatory Networks , Systems Biology , Random Forest , Time Factors , Computational Biology/methods
4.
Math Biosci Eng ; 20(7): 11676-11687, 2023 05 08.
Article in English | MEDLINE | ID: mdl-37501415

ABSTRACT

Most kidney cancers are kidney renal clear cell carcinoma (KIRC) that is a main cause of cancer-related deaths. Polygenic risk score (PRS) is a weighted linear combination of phenotypic related alleles on the genome that can be used to assess KIRC risk. However, standalone SNP data as input to the PRS model may not provide satisfactory result. Therefore, Transcriptional risk scores (TRS) based on multi-omics data and machine learning models were proposed to assess the risk of KIRC. First, we collected four types of multi-omics data (DNA methylation, miRNA, mRNA and lncRNA) of KIRC patients from the TCGA database. Subsequently, a novel TRS method utilizing multiple omics data and XGBoost model was developed. Finally, we performed prevalence analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. Our TRS methods exhibited better predictive performance than the linear models and other machine learning models. Furthermore, the prediction accuracy of combined TRS model was higher than that of single-omics TRS model. The KM curves showed that TRS was a valid prognostic indicator for cancer staging. Our proposed method extended the current definition of TRS from standalone SNP data to multi-omics data and was superior to the linear models and other machine learning models, which may provide a useful implement for diagnostic and prognostic prediction of KIRC.


Subject(s)
Carcinoma, Renal Cell , Kidney Neoplasms , MicroRNAs , Humans , Carcinoma, Renal Cell/diagnosis , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Kidney Neoplasms/diagnosis , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , MicroRNAs/genetics , Risk Factors , Kidney/pathology
5.
Mol Genet Genomic Med ; 10(11): e2047, 2022 11.
Article in English | MEDLINE | ID: mdl-36124564

ABSTRACT

BACKGROUND: Patients with impaired kidney function were found at a high risk of COVID-19 hospitalization and mortality in many observational, cross-sectional, and hospital-based studies, but evidence from large-scale prospective cohorts has been lacking. We aimed to examine the association of kidney function-related biomarkers and their genetic predisposition with the risk of developing severe COVID-19 in population-based data. METHODS: We analyzed data from UK Biobank to examine the prospective association of abnormal kidney function biomarkers with severe COVID-19, defined by laboratory-confirmed COVID-19 hospitalizations. Using genotype data, we constructed polygenic risk scores (PRS) to represent an individual's overall genetic risk for these biomarkers. We also identified tipping points where the risk of severe COVID-19 began to increase significantly for each biomarker. RESULTS: Of the 502,506 adults, 1650 (0.32%) were identified as severe COVID-19, before August 12, 2020. High levels of cystatin C (OR: 1.3; 95% CI: 1.2-1.5; FDR = 1.5 × 10-5 ), serum creatinine (OR: 1.7; 95% CI: 1.3-2.1; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ), microalbuminuria (OR: 1.4; 95% CI: 1.2-1.6; FDR = 4 × 10-4 ), and UACR (urinary albumin creatinine ratio; OR: 1.4; 95% CI: 1.2-1.6; p = 3.5 × 10-4 ; FDR = 3.5 × 10-4 ) were found significantly associated with severe COVID-19. Individuals with top 10% of PRS for elevated cystatin C, urate, and microalbuminuria had 28% to 43% higher risks of severe COVID-19 than individuals with bottom 30% PRS (p < 0.05). Tipping-point analyses further supported that severe COVID-19 could occur even when the values of cystatin C, urate (male), and microalbuminuria were within their normal value ranges (OR >1.1, p < 0.05). CONCLUSIONS: Findings from this study might point to new directions for clinicians and policymakers in optimizing risk-stratification among patients based on polygenic risk estimation and tipping points of kidney function markers. Our results call for further investigation to develop a better strategy to prevent severe COVID-19 outcomes among patients with genetic predisposition to impaired kidney function. These findings could provide a new tool for clinicians and policymakers in the future especially if we need to live with COVID-19 for a long time.


Subject(s)
COVID-19 , Renal Insufficiency , Adult , Humans , Male , Cystatin C/urine , COVID-19/genetics , Genetic Predisposition to Disease , Cross-Sectional Studies , Uric Acid , Albuminuria/genetics , Biomarkers , Kidney
6.
Sci Rep ; 12(1): 10646, 2022 06 23.
Article in English | MEDLINE | ID: mdl-35739223

ABSTRACT

The potential role of DNA methylation from paracancerous tissues in cancer diagnosis has not been explored until now. In this study, we built classification models using well-known machine learning models based on DNA methylation profiles of paracancerous tissues. We evaluated our methods on nine cancer datasets collected from The Cancer Genome Atlas (TCGA) and utilized fivefold cross-validation to assess the performance of models. Additionally, we performed gene ontology (GO) enrichment analysis on the basis of the significant CpG sites selected by feature importance scores of XGBoost model, aiming to identify biological pathways involved in cancer progression. We also exploited the XGBoost algorithm to classify cancer types using DNA methylation profiles of paracancerous tissues in external validation datasets. Comparative experiments suggested that XGBoost achieved better predictive performance than the other four machine learning methods in predicting cancer stage. GO enrichment analysis revealed key pathways involved, highlighting the importance of paracancerous tissues in cancer progression. Furthermore, XGBoost model can accurately classify nine different cancers from TCGA, and the feature sets selected by XGBoost can also effectively predict seven cancer types on independent GEO datasets. This study provided new insights into cancer diagnosis from an epigenetic perspective and may facilitate the development of personalized diagnosis and treatment strategies.


Subject(s)
DNA Methylation , Neoplasms , Epigenomics , Humans , Machine Learning , Neoplasm Staging , Neoplasms/diagnosis , Neoplasms/genetics
7.
Digit Signal Process ; 127: 103577, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35529477

ABSTRACT

The outbreak of coronavirus disease (COVID-19) and its accompanying pandemic have created an unprecedented challenge worldwide. Parametric modeling and analyses of the COVID-19 play a critical role in providing vital information about the character and relevant guidance for controlling the pandemic. However, the epidemiological utility of the results obtained from the COVID-19 transmission model largely depends on accurately identifying parameters. This paper extends the susceptible-exposed-infectious-recovered (SEIR) model and proposes an improved quantum-behaved particle swarm optimization (QPSO) algorithm to estimate its parameters. A new strategy is developed to update the weighting factor of the mean best position by the reciprocal of multiplying the fitness of each best particle with the average fitness of all best particles, which can enhance the global search capacity. To increase the particle diversity, a probability function is designed to generate new particles in the updating iteration. When compared to the state-of-the-art estimation algorithms on the epidemic datasets of China, Italy and the US, the proposed method achieves good accuracy and convergence at a comparable computational complexity. The developed framework would be beneficial for experts to understand the characteristics of epidemic development and formulate epidemic prevention and control measures.

8.
PLoS Med ; 19(4): e1003972, 2022 04.
Article in English | MEDLINE | ID: mdl-35472203

ABSTRACT

BACKGROUND: Both genetic and lifestyle factors contribute to the risk of type 2 diabetes, but the extent to which there is a synergistic effect of the 2 factors is unclear. The aim of this study was to examine the joint associations of genetic risk and diet quality with incident type 2 diabetes. METHODS AND FINDINGS: We analyzed data from 35,759 men and women in the United States participating in the Nurses' Health Study (NHS) I (1986 to 2016) and II (1991 to 2017) and the Health Professionals Follow-up Study (HPFS; 1986 to 2016) with available genetic data and who did not have diabetes, cardiovascular disease, or cancer at baseline. Genetic risk was characterized using both a global polygenic score capturing overall genetic risk and pathway-specific polygenic scores denoting distinct pathophysiological mechanisms. Diet quality was assessed using the Alternate Healthy Eating Index (AHEI). Cox models were used to calculate hazard ratios (HRs) for type 2 diabetes after adjusting for potential confounders. With over 902,386 person-years of follow-up, 4,433 participants were diagnosed with type 2 diabetes. The relative risk of type 2 diabetes was 1.29 (95% confidence interval [CI] 1.25, 1.32; P < 0.001) per standard deviation (SD) increase in global polygenic score and 1.13 (1.09, 1.17; P < 0.001) per 10-unit decrease in AHEI. Irrespective of genetic risk, low diet quality, as compared to high diet quality, was associated with approximately 30% increased risk of type 2 diabetes (Pinteraction = 0.69). The joint association of low diet quality and increased genetic risk was similar to the sum of the risk associated with each factor alone (Pinteraction = 0.30). Limitations of this study include the self-report of diet information and possible bias resulting from inclusion of highly educated participants with available genetic data. CONCLUSIONS: These data provide evidence for the independent associations of genetic risk and diet quality with incident type 2 diabetes and suggest that a healthy diet is associated with lower diabetes risk across all levels of genetic risk.


Subject(s)
Diabetes Mellitus, Type 2 , Adult , Diabetes Mellitus, Type 2/etiology , Diabetes Mellitus, Type 2/genetics , Diet/adverse effects , Female , Follow-Up Studies , Humans , Male , Prospective Studies , Risk Factors , United States/epidemiology
9.
Breast Cancer Res Treat ; 194(1): 103-111, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35467315

ABSTRACT

High levels of circulating estradiol (E2) are associated with increased risk of breast cancer, whereas its relationship with breast cancer prognosis is still unclear. We evaluated the effect of E2 concentration on survival endpoints among 8766 breast cancer cases diagnosed between 2005 and 2017 from the Tianjin Breast Cancer Cases Cohort. Levels of serum E2 were measured in pre-menopausal and post-menopausal women. Multivariable-adjusted Cox proportional hazards models were used to estimate hazard ratios (HR) and 95% confidence intervals (95% CI) between quartile of E2 levels and overall survival (OS) and progression-free survival (PFS) of breast cancer. The penalized spline was then used to test for non-linear relationships between E2 (continuous variable) and survival endpoints. 612 deaths and 982 progressions occurred over follow-up through 2017. Compared to women in the quartile 3, the highest quartile of E2 was associated with reduced risk of both PFS in pre-menopausal women (HR 1.79, 95% CI 1.17-2.75, P = 0.008) and OS in post-menopausal women (HR 1.35, 95% CI 1.04-1.74, P = 0.023). OS and PFS in pre-menopausal women exhibited a nonlinear relation ("L-shaped" and "U-shaped", respectively) with E2 levels. However, there was a linear relationship in post-menopausal women. Moreover, patients with estrogen receptor-negative (ER-negative) breast cancer showed a "U-shaped" relationship with OS and PFS in pre-menopausal women. Pre-menopausal breast cancer patients have a plateau stage of prognosis at the intermediate concentrations of E2, whereas post-menopausal patients have no apparent threshold, and ER status may have an impact on this relationship.


Subject(s)
Breast Neoplasms , Cohort Studies , Estradiol , Female , Humans , Menopause , Premenopause
10.
Bioinformatics ; 38(2): 410-418, 2022 01 03.
Article in English | MEDLINE | ID: mdl-34586380

ABSTRACT

MOTIVATION: Survival analysis using gene expression profiles plays a crucial role in the interpretation of clinical research and assessment of disease therapy programs. Several prediction models have been developed to explore the relationship between patients' covariates and survival. However, the high-dimensional genomic features limit the prediction performance of the survival model. Thus, an accurate and reliable prediction model is necessary for survival analysis using high-dimensional genomic data. RESULTS: In this study, we proposed an improved survival prediction model based on XGBoost framework called XGBLC, which used Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. The novel first- and second-order gradient statistics of Lasso-Cox were defined to construct the loss function of XGBLC. We extensively tested our XGBLC algorithm on both simulated and real-world datasets, and estimated the performance of models with 5-fold cross-validation. Based on 20 cancer datasets from The Cancer Genome Atlas (TCGA), XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score and AUC. The results show that XGBLC still keeps good accuracy and robustness by comparing the performance on the simulated datasets with different scales. The developed prediction model would be beneficial for physicians to understand the effects of patient's genomic characteristics on survival and make personalized treatment decisions. AVAILABILITY AND IMPLEMENTATION: The implementation of XGBLC algorithm based on R language is available at: https://github.com/lab319/XGBLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Neoplasms , Humans , Genomics , Neoplasms/genetics , Genome , Survival Analysis
11.
Math Biosci Eng ; 19(12): 12353-12370, 2022 08 24.
Article in English | MEDLINE | ID: mdl-36654001

ABSTRACT

BACKGROUND: Polygenic risk score (PRS) can evaluate the individual-level genetic risk of breast cancer. However, standalone single nucleotide polymorphisms (SNP) data used for PRS may not provide satisfactory prediction accuracy. Additionally, current PRS models based on linear regression have insufficient power to leverage non-linear effects from thousands of associated SNPs. Here, we proposed a transcriptional risk score (TRS) based on multiple omics data to estimate the risk of breast cancer. METHODS: The multiple omics data and clinical data of breast invasive carcinoma (BRCA) were collected from the cancer genome atlas (TCGA) and the gene expression omnibus (GEO). First, we developed a novel TRS model for BRCA utilizing single omic data and LightGBM algorithm. Subsequently, we built a combination model of TRS derived from each omic data to further improve the prediction accuracy. Finally, we performed association analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. RESULTS: The proposed TRS model achieved better predictive performance than the linear models and other ML methods in single omic dataset. An independent validation dataset also verified the effectiveness of our model. Moreover, the combination of the TRS can efficiently strengthen prediction accuracy. The analysis of prevalence and the associations of the TRS with phenotypes including case-control and cancer stage indicated that the risk of breast cancer increases with the increases of TRS. The survival analysis also suggested that TRS for the cancer stage is an effective prognostic metric of breast cancer patients. CONCLUSIONS: Our proposed TRS model expanded the current definition of PRS from standalone SNP data to multiple omics data and outperformed the linear models, which may provide a powerful tool for diagnostic and prognostic prediction of breast cancer.


Subject(s)
Algorithms , Neoplasms , Risk Factors , Survival Analysis , Genome-Wide Association Study
12.
Eur Respir J ; 58(4)2021 10.
Article in English | MEDLINE | ID: mdl-33766948

ABSTRACT

BACKGROUND: Lung function is a heritable complex phenotype with obesity being one of its important risk factors. However, knowledge of their shared genetic basis is limited. Most genome-wide association studies (GWASs) for lung function have been based on European populations, limiting the generalisability across populations. Large-scale lung function GWASs in other populations are lacking. METHODS: We included 100 285 subjects from the China Kadoorie Biobank (CKB). To identify novel loci for lung function, single-trait GWAS analyses were performed on forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC in the CKB. We then performed genome-wide cross-trait analysis between lung function and obesity traits (body mass index (BMI), BMI-adjusted waist-to-hip ratio and BMI-adjusted waist circumference) to investigate the shared genetic effects in the CKB. Finally, polygenic risk scores (PRSs) of lung function were developed in the CKB and their interaction with BMI's association on lung function were examined. We also conducted cross-trait analysis in parallel with the CKB using up to 457 756 subjects from the UK Biobank (UKB) for replication and investigation of ancestry-specific effects. RESULTS: We identified nine genome-wide significant novel loci for FEV1, six for FVC and three for FEV1/FVC in the CKB. FEV1 and FVC showed significant negative genetic correlation with obesity traits in both the CKB and UKB. Genetic loci shared between lung function and obesity traits highlighted important biological pathways, including cell proliferation, embryo, skeletal and tissue development, and regulation of gene expression. Mendelian randomisation analysis suggested significant negative causal effects of BMI on FEV1 and on FVC in both the CKB and UKB. Lung function PRSs significantly modified the effect of change in BMI on change in lung function during an average follow-up of 8 years. CONCLUSION: This large-scale GWAS of lung function identified novel loci and shared genetic aetiology between lung function and obesity. Change in BMI might affect change in lung function differently according to a subject's polygenic background. These findings may open new avenues for the development of molecular-targeted therapies for obesity and lung function improvement.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Body Mass Index , China , Forced Expiratory Volume , Humans , Lung , Obesity/genetics
13.
J Bone Miner Res ; 36(7): 1281-1287, 2021 07.
Article in English | MEDLINE | ID: mdl-33784428

ABSTRACT

Uncovering additional causal clinical traits and exposure variables is important when studying osteoporosis mechanisms and for the prevention of osteoporosis. Until recently, the causal relationship between anthropometric measurements and osteoporosis had not been fully revealed. In the present study, we utilized several state-of-the-art Mendelian randomization (MR) methods to investigate whether height, body mass index (BMI), waist-to-hip ratio (WHR), hip circumference (HC), and waist circumference (WC) are causally associated with two major characteristics of osteoporosis, bone mineral density (BMD) and fractures. Genomewide significant (p ≤ 5 × 10-8 ) single-nucleotide polymorphisms (SNPs) associated with the five anthropometric variables were obtained from previous large-scale genomewide association studies (GWAS) and were utilized as instrumental variables. Summary-level data of estimated bone mineral density (eBMD) and fractures were obtained from a large-scale UK Biobank GWAS. Of the MR methods utilized, the inverse-variance weighted method was the primary method used for analysis, and the weighted-median, MR-Egger, mode-based estimate, and MR pleiotropy residual sum and outlier methods were utilized for sensitivity analyses. The results of the present study indicated that each increase in height equal to a single standard deviation (SD) was associated with a 9.9% increase in risk of fracture (odds ratio [OR] = 1.099; 95% confidence interval [CI] 1.067-1.133; p = 8.793 × 10-10 ) and a 0.080 SD decrease of estimated bone mineral density (95% CI -0.106-(-0.054); p = 2.322 × 10-9 ). We also found that BMI was causally associated with eBMD (beta = 0.129, 95% CI 0.065-0.194; p = 8.113 × 10-5 ) but not associated with fracture. The WHR adjusted for BMI, HC adjusted for BMI, and WC adjusted for BMI were not found to be related to fracture occurrence or eBMD. In conclusion, the present study provided genetic evidence for certain causal relationships between anthropometric measurements and bone mineral density or fracture risk. © 2021 American Society for Bone and Mineral Research (ASBMR).


Subject(s)
Fractures, Bone , Osteoporosis , Bone Density/genetics , Fractures, Bone/genetics , Genome-Wide Association Study , Humans , Mendelian Randomization Analysis , Osteoporosis/genetics , Polymorphism, Single Nucleotide/genetics
14.
Metabolism ; 112: 154345, 2020 11.
Article in English | MEDLINE | ID: mdl-32835759

ABSTRACT

OBJECTIVE: We aimed to examine the associations of obesity-related traits (body mass index [BMI], central obesity) and their genetic predisposition with the risk of developing severe COVID-19 in a population-based data. RESEARCH DESIGN AND METHODS: We analyzed data from 489,769 adults enrolled in the UK Biobank-a population-based cohort study. The exposures of interest are BMI categories and central obesity (e.g., larger waist circumference). Using genome-wide genotyping data, we also computed polygenic risk scores (PRSs) that represent an individual's overall genetic risk for each obesity trait. The outcome was severe COVID-19, defined by hospitalization for laboratory-confirmed COVID-19. RESULTS: Of 489,769 individuals, 33% were normal weight (BMI, 18.5-24.9 kg/m2), 43% overweight (25.0-29.9 kg/m2), and 24% obese (≥30.0 kg/m2). The UK Biobank identified 641 patients with severe COVID-19. Compared to adults with normal weight, those with a higher BMI had a dose-response increases in the risk of severe COVID-19, with the following adjusted ORs: for 25.0-29.9 kg/m2, 1.40 (95%CI 1.14-1.73; P = 0.002); for 30.0-34.9 kg/m2, 1.73 (95%CI 1.36-2.20; P < 0.001); for 35.0-39.9 kg/m2, 2.82 (95%CI 2.08-3.83; P < 0.001); and for ≥40.0 kg/m2, 3.30 (95%CI 2.17-5.03; P < 0.001). Likewise, central obesity was associated with significantly higher risk of severe COVID-19 (P < 0.001). Furthermore, larger PRS for BMI was associated with higher risk of outcome (adjusted OR per BMI PRS Z-score 1.14, 95%CI 1.05-1.24; P = 0.004). CONCLUSIONS: In this large population-based cohort, individuals with more-severe obesity, central obesity, or genetic predisposition for obesity are at higher risk of developing severe-COVID-19.


Subject(s)
COVID-19/genetics , COVID-19/pathology , Genetic Predisposition to Disease/genetics , Obesity, Abdominal/complications , Obesity, Abdominal/genetics , Body Mass Index , Diabetes Mellitus, Type 2/genetics , Female , Humans , Male , Middle Aged , Overweight/genetics , Risk Factors , SARS-CoV-2/pathogenicity , Severity of Illness Index , Waist Circumference/genetics
16.
Comput Biol Med ; 121: 103761, 2020 06.
Article in English | MEDLINE | ID: mdl-32339094

ABSTRACT

Accurate diagnostic classification of cancers can greatly help physicians to choose surveillance and treatment strategies for patients. Following the explosive growth of huge amounts of biological data, the shift from traditional biostatistical methods to computer-aided means has made machine-learning methods as an integral part of today's cancer prognosis prediction. In this work, we proposed a classification model by leveraging the power of extreme gradient boosting (XGBoost) and using increasingly complex multi-omics data with the aim to separate early stage and late stage cancers. We applied XGBoost model to four kinds of cancer data downloaded from TCGA and compared its performance with other popular machine-learning methods. The experimental results showed that our method obtained statistically significantly better or comparable predictive performance. The results of this study also revealed that DNA methylation outperforms other molecular data (mRNA expression and miRNA expression) in terms of accuracy and stability for discriminating between early stage and late stage groups. Furthermore, integration of multi-omics data by autoencoder can enhance the classification accuracy of cancer stage. Finally, we conducted bioinformatics analyses to assess the medical utility of the significant genes ranked by their importance using XGBoost algorithm. Extensively comparative experiments demonstrated that the XGBoost method has a remarkable performance in predicting the stage of cancer patients with multi-omics data. Moreover, identification of novel candidate genes associated with cancer stages would contribute to further elucidate disease pathogenesis and develop novel therapeutics.


Subject(s)
MicroRNAs , Neoplasms , Algorithms , DNA Methylation , Humans , Machine Learning , MicroRNAs/genetics , Neoplasms/diagnosis , Neoplasms/genetics
17.
Breast Cancer ; 27(4): 621-630, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32040723

ABSTRACT

BACKGROUND: The burden of breast cancer has grown rapidly in China during recent decades. However, the association between tumor markers (CA15-3, CA125, and CEA) and breast cancer survival among certain molecular subtypes is unclear; we described this association in a large, population-based study. METHODS: We conducted a cohort study including 10,836 women according to the Tianjin Breast Cancer Cases Cohort. Demographic and epidemiologic data were collected by a structured face-to-face questionnaire. Clinico-pathological parameters were abstracted from medical records, and follow-up information was obtained once a year by telephone. The primary endpoints were breast cancer-specific survival (BCSS) and disease-free survival (DFS). We utilized the Cox proportional hazard model to calculate hazard ratios (HRs) and 95% confidence intervals (CI). RESULTS: Among all patients, elevated CA15-3 and CEA exhibited consistently and statistically significant reduced BCSS compared with normal ones (CA15-3: HR 1.54, 95% CI 1.01-2.34; CEA: HR 2.45, 95% CI 1.40-4.30). Similar patterns of association were observed for DFS (CA15-3: HR 2.09, 95% CI 1.44-3.02; CEA: HR 2.71, 95% CI 1.71-4.27). Moreover, in luminal A subtype, high CA15-3 and CEA levels were associated with decreased BCSS (CA15-3: HR 4.47, 95% CI 2.04-9.81; CEA: HR 3.79, 95% CI 1.68-8.55) and DFS (CA15-3: HR 4.06, 95% CI 2.29-7.18, CEA: HR 3.41, 95% CI 1.75-6.64). In basal-like subtype, elevated CEA conferred reduction for BCSS (HR 5.13, 95% CI 1.65-15.9). However, no association was observed between CA125 and breast cancer outcome. CONCLUSIONS: Preoperative CA15-3 and CEA levels differ in breast cancer molecular subtypes and yield strong prognostic information in Chinese women with breast cancer. Measuring CA15-3 and CEA levels before surgery may have the potential in predicting breast cancer survival and offering patients' personalized treatment strategy among luminal A and basal-like subtypes.


Subject(s)
Biomarkers, Tumor/blood , Breast Neoplasms/mortality , Breast/pathology , Adolescent , Adult , Aged , Aged, 80 and over , Breast/surgery , Breast Neoplasms/blood , Breast Neoplasms/pathology , Breast Neoplasms/therapy , CA-125 Antigen/blood , Carcinoembryonic Antigen/blood , Chemotherapy, Adjuvant , China/epidemiology , Disease-Free Survival , Female , Follow-Up Studies , GPI-Linked Proteins/blood , Humans , Mastectomy , Membrane Proteins/blood , Middle Aged , Mucin-1/blood , Preoperative Period , Prognosis , Young Adult
18.
J Cancer ; 11(5): 1288-1298, 2020.
Article in English | MEDLINE | ID: mdl-31956375

ABSTRACT

Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology.

19.
Bioinformatics ; 36(19): 4885-4893, 2020 12 08.
Article in English | MEDLINE | ID: mdl-31950997

ABSTRACT

MOTIVATION: Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. RESULTS: In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. AVAILABILITY AND IMPLEMENTATION: The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Gene Regulatory Networks , Escherichia coli/genetics , Gene Expression Regulation , Saccharomyces cerevisiae/genetics
20.
Epigenetics ; 14(4): 405-420, 2019 04.
Article in English | MEDLINE | ID: mdl-30885044

ABSTRACT

DNA methylation is known to be responsive to prenatal exposures, which may be a part of the mechanism linking early developmental exposures to future chronic diseases. Many studies use blood to measure DNA methylation, yet we know that DNA methylation is tissue specific. Placenta is central to fetal growth and development, but it is rarely feasible to collect this tissue in large epidemiological studies; on the other hand, cord blood samples are more accessible. In this study, based on paired samples of both placenta and cord blood tissues from 169 individuals, we investigated the methylation concordance between placenta and cord blood. We then employed a machine-learning-based model to predict locus-specific DNA methylation levels in placenta using DNA methylation levels in cord blood. We found that methylation correlation between placenta and cord blood is lower than other tissue pairs, consistent with existing observations that placenta methylation has a distinct pattern. Nonetheless, there are still a number of CpG sites showing robust association between the two tissues. We built prediction models for placenta methylation based on cord blood data and documented a subset of 1,012 CpG sites with high correlation between measured and predicted placenta methylation levels. The resulting list of CpG sites and prediction models could help to reveal the loci where internal or external influences may affect DNA methylation in both placenta and cord blood, and provide a reference data to predict the effects on placenta in future study even when the tissue is not available in an epidemiological study.


Subject(s)
DNA Methylation , Fetal Blood/metabolism , Genetic Loci , Models, Genetic , Placenta/metabolism , CpG Islands , Female , Humans , Machine Learning , Organ Specificity , Pregnancy
SELECTION OF CITATIONS
SEARCH DETAIL
...