Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39.873
Filter
1.
Nutr Diabetes ; 14(1): 36, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824142

ABSTRACT

BACKGROUND: Blood homocysteine (Hcy) level has become a sensitive indicator in predicting the development of cardiovascular disease. Studies have shown an association between individual mineral intake and blood Hcy levels. The effect of mixed minerals' intake on blood Hcy levels is unknown. METHODS: Data were obtained from the baseline survey data of the Shanghai Suburban Adult Cohort and Biobank(SSACB) in 2016. A total of 38273 participants aged 20-74 years met our inclusion and exclusion criteria. Food frequency questionnaire (FFQ) was used to calculate the intake of 10 minerals (calcium, potassium, magnesium, sodium, iron, zinc, selenium, phosphorus, copper and manganese). Measuring the concentration of Hcy in the morning fasting blood sample. Traditional regression models were used to assess the relationship between individual minerals' intake and blood Hcy levels. Three machine learning models (WQS, Qg-comp, and BKMR) were used to the relationship between mixed minerals' intake and blood Hcy levels, distinguishing the individual effects of each mineral and determining their respective weights in the joint effect. RESULTS: Traditional regression model showed that higher intake of calcium, phosphorus, potassium, magnesium, iron, zinc, copper, and manganese was associated with lower blood Hcy levels. Both Qg-comp and BKMR results consistently indicate that higher intake of mixed minerals is associated with lower blood Hcy levels. Calcium exhibits the highest weight in the joint effect in the WQS model. In Qg-comp, iron has the highest positive weight, while manganese has the highest negative weight. The BKMR results of the subsample after 10,000 iterations showed that except for sodium, all nine minerals had the high weights in the joint effect on the effect of blood Hcy levels. CONCLUSION: Overall, higher mixed mineral's intake was associated with lower blood Hcy levels, and each mineral contributed differently to the joint effect. Future studies are available to further explore the mechanisms underlying this association, and the potential impact of mixed minerals' intake on other health indicators needs to be further investigated. These efforts will help provide additional insights to deepen our understanding of mixed minerals and their potential role in health maintenance.


Subject(s)
Homocysteine , Machine Learning , Minerals , Humans , Middle Aged , Adult , Female , Cross-Sectional Studies , Male , Minerals/blood , Minerals/administration & dosage , Homocysteine/blood , Aged , Young Adult , China , Diet
2.
Nat Commun ; 15(1): 4693, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824154

ABSTRACT

Training large neural networks on big datasets requires significant computational resources and time. Transfer learning reduces training time by pre-training a base model on one dataset and transferring the knowledge to a new model for another dataset. However, current choices of transfer learning algorithms are limited because the transferred models always have to adhere to the dimensions of the base model and can not easily modify the neural architecture to solve other datasets. On the other hand, biological neural networks (BNNs) are adept at rearranging themselves to tackle completely different problems using transfer learning. Taking advantage of BNNs, we design a dynamic neural network that is transferable to any other network architecture and can accommodate many datasets. Our approach uses raytracing to connect neurons in a three-dimensional space, allowing the network to grow into any shape or size. In the Alcala dataset, our transfer learning algorithm trains the fastest across changing environments and input sizes. In addition, we show that our algorithm also outperformance the state of the art in EEG dataset. In the future, this network may be considered for implementation on real biological neural networks to decrease power consumption.


Subject(s)
Algorithms , Neural Networks, Computer , Humans , Neurons/physiology , Electroencephalography , Machine Learning , Models, Neurological
3.
Sci Rep ; 14(1): 12601, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824162

ABSTRACT

Data categorization is a top concern in medical data to predict and detect illnesses; thus, it is applied in modern healthcare informatics. In modern informatics, machine learning and deep learning models have enjoyed great attention for categorizing medical data and improving illness detection. However, the existing techniques, such as features with high dimensionality, computational complexity, and long-term execution duration, raise fundamental problems. This study presents a novel classification model employing metaheuristic methods to maximize efficient positives on Chronic Kidney Disease diagnosis. The medical data is initially massively pre-processed, where the data is purified with various mechanisms, including missing values resolution, data transformation, and the employment of normalization procedures. The focus of such processes is to leverage the handling of the missing values and prepare the data for deep analysis. We adopt the Binary Grey Wolf Optimization method, a reliable subset selection feature using metaheuristics. This operation is aimed at improving illness prediction accuracy. In the classification step, the model adopts the Extreme Learning Machine with hidden nodes through data optimization to predict the presence of CKD. The complete classifier evaluation employs established measures, including recall, specificity, kappa, F-score, and accuracy, in addition to the feature selection. Data related to the study show that the proposed approach records high levels of accuracy, which is better than the existing models.


Subject(s)
Medical Informatics , Renal Insufficiency, Chronic , Humans , Renal Insufficiency, Chronic/diagnosis , Medical Informatics/methods , Machine Learning , Deep Learning , Algorithms , Male , Female , Middle Aged
4.
Sci Rep ; 14(1): 12591, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824178

ABSTRACT

Effective blood glucose management is crucial for people with diabetes to avoid acute complications. Predicting extreme values accurately and in a timely manner is of vital importance to them. People with diabetes are particularly concerned about suffering a hypoglycemia (low value) event and, moreover, that the event will be prolonged in time. It is crucial to predict hyperglycemia (high value) and hypoglycemia events that may cause health damages in the short term and potential permanent damages in the long term. This paper describes our research on predicting hypoglycemia events at 30, 60, 90, and 120 minutes using machine learning methods. We propose using structured Grammatical Evolution and dynamic structured Grammatical Evolution to produce interpretable mathematical expressions that predict a hypoglycemia event. Our proposal generates white-box models induced by a grammar based on if-then-else conditions using blood glucose, heart rate, number of steps, and burned calories as the inputs for the machine learning technique. We apply these techniques to create three types of models: individualized, cluster, and population-based. They all are then compared with the predictions of eleven machine learning techniques. We apply these techniques to a dataset of 24 real patients of the Hospital Universitario Principe de Asturias, Madrid, Spain. The resulting models, presented as if-then-else statements that incorporate numeric, relational, and logical operations between variables and constants, are inherently interpretable. The True Positive Rate and True Negative Rate metrics are above 0.90 for 30-minute predictions, 0.80 for 60 min, and 0.70 for 90 min and 120 min for the three types of models. Individualized models exhibit the best metrics, while cluster and population-based models perform similarly. Structured and dynamic structured grammatical evolution techniques perform similarly for all forecasting horizons. Regarding the comparison of different machine learning techniques, on the shorter forecasting horizons, our proposals have a high probability of winning, a probability that diminishes on the longer time horizons. Structured grammatical evolution provides advanced forecasting models that facilitate model explanation, modification, and retesting, offering flexibility for refining solutions post-creation and a deeper understanding of blood glucose behavior. These models have been integrated into the glUCModel application, designed to serve people with diabetes.


Subject(s)
Blood Glucose , Hypoglycemia , Machine Learning , Humans , Blood Glucose/metabolism , Diabetes Mellitus , Models, Theoretical , Algorithms
5.
Sci Rep ; 14(1): 12624, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824215

ABSTRACT

This study aimed to identify factors that affect lymphovascular space invasion (LVSI) in endometrial cancer (EC) using machine learning technology, and to build a clinical risk assessment model based on these factors. Samples were collected from May 2017 to March 2022, including 312 EC patients who received treatment at Xuzhou Medical University Affiliated Hospital of Lianyungang. Of these, 219 cases were collected for the training group and 93 for the validation group. Clinical data and laboratory indicators were analyzed. Logistic regression and least absolute shrinkage and selection operator (LASSO) regression were used to analyze risk factors and construct risk models. The LVSI and non-LVSI groups showed statistical significance in clinical data and laboratory indicators (P < 0.05). Multivariable logistic regression analysis identified independent risk factors for LVSI in EC, which were myometrial infiltration depth, cervical stromal invasion, lymphocyte count (LYM), monocyte count (MONO), albumin (ALB), and fibrinogen (FIB) (P < 0.05). LASSO regression identified 19 key feature factors for model construction. In the training and validation groups, the risk scores for the logistic and LASSO models were significantly higher in the LVSI group compared with that in the non-LVSI group (P < 0.001). The model was built based on machine learning and can effectively predict LVSI in EC and enhance preoperative decision-making. The reliability of the model was demonstrated by the significant difference in risk scores between LVSI and non-LVSI patients in both the training and validation groups.


Subject(s)
Endometrial Neoplasms , Machine Learning , Neoplasm Invasiveness , Humans , Female , Endometrial Neoplasms/pathology , Endometrial Neoplasms/diagnosis , Middle Aged , Risk Factors , Risk Assessment/methods , Aged , Lymphatic Metastasis , Logistic Models
6.
J Transl Med ; 22(1): 523, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38822359

ABSTRACT

OBJECTIVE: Diabetic macular edema (DME) is the leading cause of visual impairment in patients with diabetes mellitus (DM). The goal of early detection has not yet achieved due to a lack of fast and convenient methods. Therefore, we aim to develop and validate a prediction model to identify DME in patients with type 2 diabetes mellitus (T2DM) using easily accessible systemic variables, which can be applied to an ophthalmologist-independent scenario. METHODS: In this four-center, observational study, a total of 1994 T2DM patients who underwent routine diabetic retinopathy screening were enrolled, and their information on ophthalmic and systemic conditions was collected. Forward stepwise multivariable logistic regression was performed to identify risk factors of DME. Machine learning and MLR (multivariable logistic regression) were both used to establish prediction models. The prediction models were trained with 1300 patients and prospectively validated with 104 patients from Guangdong Provincial People's Hospital (GDPH). A total of 175 patients from Zhujiang Hospital (ZJH), 115 patients from the First Affiliated Hospital of Kunming Medical University (FAHKMU), and 100 patients from People's Hospital of JiangMen (PHJM) were used as external validation sets. Area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity, and specificity were used to evaluate the performance in DME prediction. RESULTS: The risk of DME was significantly associated with duration of DM, diastolic blood pressure, hematocrit, glycosylated hemoglobin, and urine albumin-to-creatinine ratio stage. The MLR model using these five risk factors was selected as the final prediction model due to its better performance than the machine learning models using all variables. The AUC, ACC, sensitivity, and specificity were 0.80, 0.69, 0.80, and 0.67 in the internal validation, and 0.82, 0.54, 1.00, and 0.48 in prospective validation, respectively. In external validation, the AUC, ACC, sensitivity and specificity were 0.84, 0.68, 0.90 and 0.60 in ZJH, 0.89, 0.77, 1.00 and 0.72 in FAHKMU, and 0.80, 0.67, 0.75, and 0.65 in PHJM, respectively. CONCLUSION: The MLR model is a simple, rapid, and reliable tool for early detection of DME in individuals with T2DM without the needs of specialized ophthalmologic examinations.


Subject(s)
Diabetes Mellitus, Type 2 , Diabetic Retinopathy , Early Diagnosis , Macular Edema , Humans , Diabetes Mellitus, Type 2/complications , Macular Edema/complications , Macular Edema/diagnosis , Macular Edema/blood , Male , Female , Diabetic Retinopathy/diagnosis , Middle Aged , Risk Factors , ROC Curve , Aged , Reproducibility of Results , Machine Learning , Multivariate Analysis , Area Under Curve , Logistic Models
7.
Water Sci Technol ; 89(10): 2605-2624, 2024 May.
Article in English | MEDLINE | ID: mdl-38822603

ABSTRACT

Floods are one of the most destructive disasters that cause loss of life and property worldwide every year. In this study, the aim was to find the best-performing model in flood sensitivity assessment and analyze key characteristic factors, the spatial pattern of flood sensitivity was evaluated using three machine learning (ML) models: Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF). Suqian City in Jiangsu Province was selected as the study area, and a random sample dataset of historical flood points was constructed. Fifteen different meteorological, hydrological, and geographical spatial variables were considered in the flood sensitivity assessment, 12 variables were selected based on the multi-collinearity study. Among the results of comparing the selected ML models, the RF method had the highest AUC value, accuracy, and comprehensive evaluation effect, and is a reliable and effective flood risk assessment model. As the main output of this study, the flood sensitivity map is divided into five categories, ranging from very low to very high sensitivity. Using the RF model (i.e., the highest accuracy of the model), the high-risk area covers about 44% of the study area, mainly concentrated in the central, eastern, and southern parts of the old city area.


Subject(s)
Floods , Logistic Models , Machine Learning , China , Models, Theoretical , Random Forest
8.
Langenbecks Arch Surg ; 409(1): 170, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38822883

ABSTRACT

PURPOSE: Perioperative decision making for large (> 2 cm) rectal polyps with ambiguous features is complex. The most common intraprocedural assessment is clinician judgement alone while radiological and endoscopic biopsy can provide periprocedural detail. Fluorescence-augmented machine learning (FA-ML) methods may optimise local treatment strategy. METHODS: Surgeons of varying grades, all performing colonoscopies independently, were asked to visually judge endoscopic videos of large benign and early-stage malignant (potentially suitable for local excision) rectal lesions on an interactive video platform (Mindstamp) with results compared with and between final pathology, radiology and a novel FA-ML classifier. Statistical analyses of data used Fleiss Multi-rater Kappa scoring, Spearman Coefficient and Frequency tables. RESULTS: Thirty-two surgeons judged 14 ambiguous polyp videos (7 benign, 7 malignant). In all cancers, initial endoscopic biopsy had yielded false-negative results. Five of each lesion type had had a pre-excision MRI with a 60% false-positive malignancy prediction in benign lesions and a 60% over-staging and 40% equivocal rate in cancers. Average clinical visual cancer judgement accuracy was 49% (with only 'fair' inter-rater agreement), many reporting uncertainty and higher reported decision confidence did not correspond to higher accuracy. This compared to 86% ML accuracy. Size was misjudged visually by a mean of 20% with polyp size underestimated in 4/6 and overestimated in 2/6. Subjective narratives regarding decision-making requested for 7/14 lesions revealed wide rationale variation between participants. CONCLUSION: Current available clinical means of ambiguous rectal lesion assessment is suboptimal with wide inter-observer variation. Fluorescence based AI augmentation may advance this field via objective, explainable ML methods.


Subject(s)
Colonoscopy , Rectal Neoplasms , Humans , Rectal Neoplasms/pathology , Rectal Neoplasms/surgery , Rectal Neoplasms/diagnostic imaging , Intestinal Polyps/pathology , Intestinal Polyps/surgery , Machine Learning , Male , Fluorescence , Female , Observer Variation
9.
Arch Dermatol Res ; 316(6): 326, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38822910

ABSTRACT

Skin aging is one of the visible characteristics of the aging process in humans. In recent years, different biological clocks have been generated based on protein or epigenetic markers, but few have focused on biological age in the skin. Arrest the aging process or even being able to restore an organism from an older to a younger stage is one of the main challenges in the last 20 years in biomedical research. We have implemented several machine learning models, including regression and classification algorithms, in order to create an epigenetic molecular clock based on miRNA expression profiles of healthy subjects to predict biological age-related to skin. Our best models are capable of classifying skin samples according to age groups (18-28; 29-39; 40-50; 51-60 or 61-83 years old) with an accuracy of 80% or predict age with a mean absolute error of 10.89 years using the expression levels of 1856 unique miRNAs. Our results suggest that this kind of epigenetic clocks arises as a promising tool with several applications in the pharmaco-cosmetic industry.


Subject(s)
Epigenesis, Genetic , Machine Learning , MicroRNAs , Skin Aging , Skin , Humans , MicroRNAs/genetics , Middle Aged , Aged , Adult , Skin Aging/genetics , Aged, 80 and over , Skin/metabolism , Skin/pathology , Female , Young Adult , Male , Adolescent , Gene Expression Profiling , Biological Clocks/genetics
10.
Food Res Int ; 188: 114464, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38823834

ABSTRACT

Vibrio parahaemolyticus and Vibrio vulnificus are bacteria with a significant public health impact. Identifying factors impacting their presence and concentrations in food sources could enable the identification of significant risk factors and prevent incidences of foodborne illness. In recent years, machine learning has shown promise in modeling microbial presence based on prevalent external and internal variables, such as environmental variables and gene presence/absence, respectively, particularly with the generation and availability of large amounts and diverse sources of data. Such analyses can prove useful in predicting microbial behavior in food systems, particularly under the influence of the constant changes in environmental variables. In this study, we tested the efficacy of six machine learning regression models (random forest, support vector machine, elastic net, neural network, k-nearest neighbors, and extreme gradient boosting) in predicting the relationship between environmental variables and total and pathogenic V. parahaemolyticus and V. vulnificus concentrations in seawater and oysters. In general, environmental variables were found to be reliable predictors of total and pathogenic V. parahaemolyticus and V. vulnificus concentrations in seawater, and pathogenic V. parahaemolyticus in oysters (Acceptable Prediction Zone >70 %) when analyzed using our machine learning models. SHapley Additive exPlanations, which was used to identify variables influencing Vibrio concentrations, identified chlorophyll a content, seawater salinity, seawater temperature, and turbidity as influential variables. It is important to note that different strains were differentially impacted by the same environmental variable, indicating the need for further research to study the causes and potential mechanisms of these variations. In conclusion, environmental variables could be important predictors of Vibrio growth and behavior in seafood. Moreover, the models developed in this study could prove invaluable in assessing and managing the risks associated with V. parahaemolyticus and V. vulnificus, particularly in the face of a changing environment.


Subject(s)
Machine Learning , Ostreidae , Seawater , Vibrio parahaemolyticus , Vibrio vulnificus , Ostreidae/microbiology , Seawater/microbiology , Vibrio parahaemolyticus/isolation & purification , Vibrio parahaemolyticus/growth & development , Animals , Vibrio vulnificus/isolation & purification , Vibrio vulnificus/growth & development , Food Microbiology , Food Contamination/analysis , Shellfish/microbiology , Seafood/microbiology , Temperature , Vibrio/isolation & purification
11.
Gene ; 920: 148519, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-38703867

ABSTRACT

Epithelial-mesenchymal transition (EMT) plays a crucial role in regulating inflammatory responses and fibrosis formation. This study aims to explore the molecular mechanisms of EMT-related genes in Crohn's disease (CD) through bioinformatics methods and identify potential key biomarkers. In our research, we identified differentially expressed genes (DEGs) related to EMT based on the GSE52746 dataset and the gene set in the GeneCards database. Key genes were identified through Lasso-cox and Random Forest and validated using the external dataset GSE10616. Immune infiltration analysis showed that Lysophosphatidylcholine acyltransferase 1 (LPCAT1) was positively correlated with Neutrophils and Macrophages M1. The Gene Set Enrichment Analysis (GSEA) results for LPCAT1 showed associations with celladhesionmolecules and ECM receptor interaction. Additionally, a lncRNA-miRNA-mRNA ceRNA network was constructed. Finally, we validated that knocking down LPCAT1 could inhibit the release of inflammatory factors, EMT, and the elevation of fibrosis indices as well as the activation of NF-κB signaling pathway in LPS-induced HT-29 cells. LPCAT1 plays an important role in the occurrence and development of CD and may become a new biomarker.


Subject(s)
1-Acylglycerophosphocholine O-Acyltransferase , Biomarkers , Computational Biology , Crohn Disease , Machine Learning , Humans , Crohn Disease/genetics , Computational Biology/methods , Biomarkers/metabolism , 1-Acylglycerophosphocholine O-Acyltransferase/genetics , 1-Acylglycerophosphocholine O-Acyltransferase/metabolism , Epithelial-Mesenchymal Transition/genetics , HT29 Cells , MicroRNAs/genetics , MicroRNAs/metabolism , RNA, Long Noncoding/genetics , Gene Regulatory Networks , Gene Expression Profiling/methods , Signal Transduction/genetics
12.
Chemosphere ; 358: 142223, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38704045

ABSTRACT

Antibiotic resistance (AR) is considered one of the greatest global threats in the current century, which can only be overcome if all interconnected areas of humans, animals and the environment are taken into account as part of the One Health concept proposed by the World Health Organization (WHO). Water and wastewater are among the most important environmental media of AR sources, where the phenomena are generally non-linear. Therefore, the aim of this study was to investigate the application of machine learning-based methods (MLMs) to solve AR-induced problems in water and wastewater. For this purpose, most relevant databases were searched in the period between 1987 and 2023 to systematically analyze and categorize the applications. Accordingly, the results showed that out of 12 applications, 11 (91.6%) were for shallow learning and 1 (8.3%) for deep learning. In shallow learning category, n = 6, 50% of the applications were regression and n = 4, 33.3% were classification, mainly using artificial neural networks, decision trees and Bayesian methods for the following objectives: Predicting the survival of antibiotic-resistant bacteria (ARB), determining the order of influencing parameters on AR-based scores, and identifying the major sources of antibiotic resistance genes (ARGs). In addition, only one study (8.3%) was found for clustering and no study for association. Surprisingly, deep learning had been used in only one study (8.3%) to predict ARGs sequences. Therefore, working on the knowledge gaps of AR, especially using clustering, association and deep learning methods, would be a promising option to analyze more aspects of the related problems. However, there is still a long way to go to consider and apply MLMs as unique approaches to study different aspects of AR in water and wastewater.


Subject(s)
Machine Learning , Wastewater , Wastewater/microbiology , Drug Resistance, Microbial/genetics , Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacteria/genetics , Bayes Theorem , Neural Networks, Computer , Drug Resistance, Bacterial/genetics
13.
Sci Rep ; 14(1): 12483, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816409

ABSTRACT

Effective management of dementia requires the timely detection of mild cognitive impairment (MCI). This paper introduces a multi-objective optimization approach for selecting EEG channels (and features) for the purpose of detecting MCI. Firstly, each EEG signal from each channel is decomposed into subbands using either variational mode decomposition (VMD) or discrete wavelet transform (DWT). A feature is then extracted from each subband using one of the following measures: standard deviation, interquartile range, band power, Teager energy, Katz's and Higuchi's fractal dimensions, Shannon entropy, sure entropy, or threshold entropy. Different machine learning techniques are used to classify the features of MCI cases from those of healthy controls. The classifier's performance is validated using leave-one-subject-out (LOSO) cross-validation (CV). The non-dominated sorting genetic algorithm (NSGA)-II is designed with the aim of minimizing the number of EEG channels (or features) and maximizing classification accuracy. The performance is evaluated using a publicly available online dataset containing EEGs from 19 channels recorded from 24 participants. The results demonstrate a significant improvement in performance when utilizing the NSGA-II algorithm. By selecting only a few appropriate EEG channels, the LOSO CV-based results show a significant improvement compared to using all 19 channels. Additionally, the outcomes indicate that accuracy can be further improved by selecting suitable features from different channels. For instance, by combining VMD and Teager energy, the SVM accuracy obtained using all channels is 74.24%. Interestingly, when only five channels are selected using NSGA-II, the accuracy increases to 91.56%. The accuracy is further improved to 95.28% when using only 8 features selected from 7 channels. This demonstrates that by choosing informative features or channels while excluding noisy or irrelevant information, the impact of noise is reduced, resulting in improved accuracy. These promising findings indicate that, with a limited number of channels and features, accurate diagnosis of MCI is achievable, which opens the door for its application in clinical practice.


Subject(s)
Algorithms , Cognitive Dysfunction , Electroencephalography , Humans , Electroencephalography/methods , Cognitive Dysfunction/diagnosis , Aged , Female , Male , Wavelet Analysis , Machine Learning , Middle Aged , Signal Processing, Computer-Assisted
14.
Sci Rep ; 14(1): 12436, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816422

ABSTRACT

We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.


Subject(s)
Blood Pressure , Genome-Wide Association Study , Machine Learning , Multifactorial Inheritance , Phenotype , Humans , Blood Pressure/genetics , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Risk Factors , Male , Female , Genetic Predisposition to Disease , Models, Genetic , Hypertension/genetics , Hypertension/physiopathology , Middle Aged , Genetic Risk Score
15.
NPJ Syst Biol Appl ; 10(1): 62, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38816426

ABSTRACT

Individual may response to drug treatment differently due to their genetic variants located in enhancers. These variants can alter transcription factor's (TF) binding strength, affect enhancer's chromatin activity or interaction, and eventually change expression level of downstream gene. Here, we propose a computational framework, PERD, to Predict the Enhancers Responsive to Drug. A machine learning model was trained to predict the genome-wide chromatin accessibility from transcriptome data using the paired expression and chromatin accessibility data collected from ENCODE and ROADMAP. Then the model was applied to the perturbed gene expression data from Connectivity Map (CMAP) and Cancer Drug-induced gene expression Signature DataBase (CDS-DB) and identify drug responsive enhancers with significantly altered chromatin accessibility. Furthermore, the drug responsive enhancers were related to the pharmacogenomics genome-wide association studies (PGx GWAS). Stepping on the traditional drug-associated gene signatures, PERD holds the promise to enhance the causality of drug perturbation by providing candidate regulatory element of those drug associated genes.


Subject(s)
Chromatin , Genome-Wide Association Study , Machine Learning , Chromatin/genetics , Chromatin/drug effects , Humans , Genome-Wide Association Study/methods , Enhancer Elements, Genetic/genetics , Computational Biology/methods , Transcriptome/genetics , Transcriptome/drug effects , Transcription Factors/genetics , Gene Expression Profiling/methods , Pharmacogenetics/methods
16.
Sci Rep ; 14(1): 12411, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816446

ABSTRACT

Knowledge distillation is an effective approach for training robust multi-modal machine learning models when synchronous multimodal data are unavailable. However, traditional knowledge distillation techniques have limitations in comprehensively transferring knowledge across modalities and models. This paper proposes a multiscale knowledge distillation framework to address these limitations. Specifically, we introduce a multiscale semantic graph mapping (SGM) loss function to enable more comprehensive knowledge transfer between teacher and student networks at multiple feature scales. We also design a fusion and tuning (FT) module to fully utilize correlations within and between different data types of the same modality when training teacher networks. Furthermore, we adopt transformer-based backbones to improve feature learning compared to traditional convolutional neural networks. We apply the proposed techniques to multimodal human activity recognition and compared with the baseline method, it improved by 2.31% and 0.29% on the MMAct and UTD-MHAD datasets. Ablation studies validate the necessity of each component.


Subject(s)
Human Activities , Machine Learning , Neural Networks, Computer , Humans , Algorithms , Attention
17.
Sci Rep ; 14(1): 12426, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816457

ABSTRACT

IgA nephropathy progresses to kidney failure, making early detection important. However, definitive diagnosis depends on invasive kidney biopsy. This study aimed to develop non-invasive prediction models for IgA nephropathy using machine learning. We collected retrospective data on demographic characteristics, blood tests, and urine tests of the patients who underwent kidney biopsy. The dataset was divided into derivation and validation cohorts, with temporal validation. We employed five machine learning models-eXtreme Gradient Boosting (XGBoost), LightGBM, Random Forest, Artificial Neural Networks, and 1 Dimentional-Convolutional Neural Network (1D-CNN)-and logistic regression, evaluating performance via the area under the receiver operating characteristic curve (AUROC) and explored variable importance through SHapley Additive exPlanations method. The study included 1268 participants, with 353 (28%) diagnosed with IgA nephropathy. In the derivation cohort, LightGBM achieved the highest AUROC of 0.913 (95% CI 0.906-0.919), significantly higher than logistic regression, Artificial Neural Network, and 1D-CNN, not significantly different from XGBoost and Random Forest. In the validation cohort, XGBoost demonstrated the highest AUROC of 0.894 (95% CI 0.850-0.935), maintaining its robust performance. Key predictors identified were age, serum albumin, IgA/C3, and urine red blood cells, aligning with existing clinical insights. Machine learning can be a valuable non-invasive tool for IgA nephropathy.


Subject(s)
Glomerulonephritis, IGA , Machine Learning , Humans , Glomerulonephritis, IGA/diagnosis , Glomerulonephritis, IGA/urine , Glomerulonephritis, IGA/pathology , Glomerulonephritis, IGA/blood , Male , Female , Adult , Retrospective Studies , Middle Aged , Neural Networks, Computer , ROC Curve , Logistic Models , Biopsy
18.
Sci Rep ; 14(1): 12468, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816468

ABSTRACT

Post-traumatic stress disorder (PTSD) lacks clear biomarkers in clinical practice. Language as a potential diagnostic biomarker for PTSD is investigated in this study. We analyze an original cohort of 148 individuals exposed to the November 13, 2015, terrorist attacks in Paris. The interviews, conducted 5-11 months after the event, include individuals from similar socioeconomic backgrounds exposed to the same incident, responding to identical questions and using uniform PTSD measures. Using this dataset to collect nuanced insights that might be clinically relevant, we propose a three-step interdisciplinary methodology that integrates expertise from psychiatry, linguistics, and the Natural Language Processing (NLP) community to examine the relationship between language and PTSD. The first step assesses a clinical psychiatrist's ability to diagnose PTSD using interview transcription alone. The second step uses statistical analysis and machine learning models to create language features based on psycholinguistic hypotheses and evaluate their predictive strength. The third step is the application of a hypothesis-free deep learning approach to the classification of PTSD in our cohort. Results show that the clinical psychiatrist achieved a diagnosis of PTSD with an AUC of 0.72. This is comparable to a gold standard questionnaire (Area Under Curve (AUC) ≈ 0.80). The machine learning model achieved a diagnostic AUC of 0.69. The deep learning approach achieved an AUC of 0.64. An examination of model error informs our discussion. Importantly, the study controls for confounding factors, establishes associations between language and DSM-5 subsymptoms, and integrates automated methods with qualitative analysis. This study provides a direct and methodologically robust description of the relationship between PTSD and language. Our work lays the groundwork for advancing early and accurate diagnosis and using linguistic markers to assess the effectiveness of pharmacological treatments and psychotherapies.


Subject(s)
Deep Learning , Language , Machine Learning , Stress Disorders, Post-Traumatic , Stress Disorders, Post-Traumatic/diagnosis , Humans , Male , Female , Adult , Natural Language Processing , Biomarkers , Middle Aged
19.
Sci Rep ; 14(1): 12428, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816528

ABSTRACT

Electromyography (EMG) is considered a potential predictive tool for the severity of knee osteoarthritis (OA) symptoms and functional outcomes. Patient-reported outcome measures (PROMs), such as the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and visual analog scale (VAS), are used to determine the severity of knee OA. We aim to investigate muscle activation and co-contraction patterns through EMG from the lower extremity muscles of patients with advanced knee OA patients and evaluate the effectiveness of an interpretable machine-learning model to estimate the severity of knee OA according to the WOMAC (pain, stiffness, and physical function) and VAS using EMG gait features. To explore neuromuscular gait patterns with knee OA severity, EMG from rectus femoris, medial hamstring, tibialis anterior, and gastrocnemius muscles were recorded from 84 patients diagnosed with advanced knee OA during ground walking. Muscle activation patterns and co-activation indices were calculated over the gait cycle for pairs of medial and lateral muscles. We utilized machine-learning regression models to estimate the severity of knee OA symptoms according to the PROMs using muscle activity and co-contraction features. Additionally, we utilized the Shapley Additive Explanations (SHAP) to interpret the contribution of the EMG features to the regression model for estimation of knee OA severity according to WOMAC and VAS. Muscle activity and co-contraction patterns varied according to the functional limitations associated with knee OA severity according to VAS and WOMAC. The coefficient of determination of the cross-validated regression model is 0.85 for estimating WOMAC, 0.82 for pain, 0.85 for stiffness, and 0.85 for physical function, as well as VAS scores, utilizing the gait features. SHAP explanation revealed that greater co-contraction of lower extremity muscles during the weight acceptance and swing phases indicated more severe knee OA. The identified muscle co-activation patterns may be utilized as objective candidate outcomes to better understand the severity of knee OA.


Subject(s)
Electromyography , Gait , Knee Joint , Muscle, Skeletal , Osteoarthritis, Knee , Patient Reported Outcome Measures , Humans , Osteoarthritis, Knee/physiopathology , Male , Female , Middle Aged , Aged , Knee Joint/physiopathology , Muscle, Skeletal/physiopathology , Gait/physiology , Machine Learning , Severity of Illness Index , Muscle Contraction
20.
Sci Rep ; 14(1): 12415, 2024 05 30.
Article in English | MEDLINE | ID: mdl-38816560

ABSTRACT

Gastrointestinal stromal tumors (GISTs) are a rare type of tumor that can develop liver metastasis (LIM), significantly impacting the patient's prognosis. This study aimed to predict LIM in GIST patients by constructing machine learning (ML) algorithms to assist clinicians in the decision-making process for treatment. Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases from 2010 to 2015 were assigned to the developing sets, while cases from 2016 to 2017 were assigned to the testing set. Missing values were addressed using the multiple imputation technique. Four algorithms were utilized to construct the models, comprising traditional logistic regression (LR) and automated machine learning (AutoML) analysis such as gradient boost machine (GBM), deep neural net (DL), and generalized linear model (GLM). We evaluated the models' performance using LR-based metrics, including the area under the receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA), as well as AutoML-based metrics, such as feature importance, SHapley Additive exPlanation (SHAP) Plots, and Local Interpretable Model Agnostic Explanation (LIME). A total of 6207 patients were included in this study, with 2683, 1780, and 1744 patients allocated to the training, validation, and test sets, respectively. Among the different models evaluated, the GBM model demonstrated the highest performance in the training, validation, and test cohorts, with respective AUC values of 0.805, 0.780, and 0.795. Furthermore, the GBM model outperformed other AutoML models in terms of accuracy, achieving 0.747, 0.700, and 0.706 in the training, validation, and test cohorts, respectively. Additionally, the study revealed that tumor size and tumor location were the most significant predictors influencing the AutoML model's ability to accurately predict LIM. The AutoML model utilizing the GBM algorithm for GIST patients can effectively predict the risk of LIM and provide clinicians with a reference for developing individualized treatment plans.


Subject(s)
Gastrointestinal Stromal Tumors , Liver Neoplasms , Machine Learning , SEER Program , Humans , Gastrointestinal Stromal Tumors/pathology , Liver Neoplasms/secondary , Male , Female , Middle Aged , Retrospective Studies , Aged , Prognosis , Adult , Algorithms , ROC Curve , Gastrointestinal Neoplasms/pathology
SELECTION OF CITATIONS
SEARCH DETAIL
...