Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 466
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-39352637

RESUMO

In 2020, China pledged carbon reduction targets at the United Nations: peaking emissions by 2030 and achieving carbon neutrality by 2060. Research and prediction of regional carbon emissions are crucial for achieving these dual carbon targets across China. This study aims to construct an indicator system for regional carbon emissions and utilize it for forecasting. Analyzing carbon emission data from a specific area in Hainan Province from 2010 to 2020, we established an indicator system. Using the interpretable SHAP model, we assessed indicator importance and trends. Employing an improved STIRPAT model with partial least squares regression to address multicollinearity among influencing factors, we developed a carbon emission prediction model. Based on this, we forecasted carbon emissions from 2021 to 2060 in the specified area under three scenarios: natural, baseline, and ambitious. The results show that the growth of resident population and per capita GDP has the most significant promoting effect on carbon emissions in the region while optimizing industrial structure, energy consumption structure, and reducing energy intensity will inhibit carbon emissions. The prediction results indicate that in the natural scenario, regional carbon emissions will peak in 2035, and achieving carbon neutrality by 2060 is not feasible, while the baseline scenario and ambitious scenario can achieve the dual carbon targets on time or even earlier. The research results of this article provide a reference method for predicting carbon emissions in other regions and a guide for future regional emission reduction.

2.
Med Biol Eng Comput ; 2024 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-39384707

RESUMO

This study explores the bidirectional relation of esophageal squamous cell carcinoma (ESCC) and oral squamous cell carcinoma (OSCC), examining shared risk factors and underlying molecular mechanisms. By employing random forest (RF) classifier, enhanced with interpretable machine learning (IML) through SHapley Additive exPlanations (SHAP), we analyzed gene expression from two GEO datasets (GSE30784 and GSE44021). The GSE30784 dataset comprises 167 OSCC samples and 45 control group, whereas the GSE44021 dataset encompasses 113 ESCC samples and 113 control group. Our analysis led to identification of 20 key genes, such as XBP1, VGLL1, and RAD1, which are significantly associated with development of ESCC and OSCC. Further investigations were conducted using tools like NetworkAnalyst 3.0, Single Cell Portal, and miRNET 2.0, which highlighted complex interactions between these genes and specific miRNA targets including hsa-mir-124-3p and hsa-mir-1-3p. Our model achieved high precision in identifying genes linked to crucial processes like programmed cell death and cancer pathways, suggesting new avenues for diagnosis and treatment. This study confirms the bidirectional relationship between OSCC and ESCC, laying groundwork for targeted therapeutic approaches. This study helps to identify shared biological pathways and genetic factors of these conditions for designing personalized medicine strategies and to improve disease management.

3.
J Affect Disord ; 369: 352-363, 2024 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-39374738

RESUMO

OBJECTIVE: The objective was to utilize nine machine learning (ML) methods to predict the prognosis of antibody positive autoimmune encephalitis (AE) patients. METHODS: The encephalitis data from the Global Burden of Disease (GBD) study is analyzed to reflect the disease burden of encephalitis. This study included 187 patients with AE. 121 patients as training set and 67 patients as validation set. Decision trees (DT), random forest (RF), extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), naive bayes (NB), neural network (NN), light gradient boosting machine (LGBM), and logistic regression (LR) are ML methods used to construct predictive models. The constructed models were validated for discrimination, calibration and clinical applicability using validation set data. Shapley additive explanation (SHAP) analysis was used to explain the model. RESULTS: The number of encephalitis worldwide deaths, incidence and prevalence is increasing every year from 2010 to 2021. The training set included 121 patients with AE. Univariate analysis and LASSO screening identified six variables. The results of constructing models using 9 ML methods showed RF had the highest accuracy (0.860), followed by XGBoost (0.826), with F1 scores of 0.844 and 0.807, respectively. Validation set data showed good discrimination, calibration and clinical applicability of the model. The SHAP values of infection, CSF monocyte percentage, and prealbumin were 0.906, 0.790, and 0.644, respectively. LIMITATIONS: As a rare disease, the sample size of this study is relatively small. CONCLUSION: The model constructed using RF and XGBoost has good performance, good discrimination, calibration, clinical applicability, and interpretability.

4.
Sci Rep ; 14(1): 23277, 2024 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-39375427

RESUMO

One of the critical issues in medical data analysis is accurately predicting a patient's risk of heart disease, which is vital for early intervention and reducing mortality rates. Early detection allows for timely treatment and continuous monitoring by healthcare providers, which is essential but often limited by the inability of medical professionals to provide constant patient supervision. Early detection of cardiac problems and continuous patient monitoring by physicians can help reduce death rates. Doctors cannot constantly have contact with patients, and heart disease detection is not always accurate. By offering a more solid foundation for prediction and decision-making based on data provided by healthcare sectors worldwide, machine learning (ML) could help physicians with the prediction and detection of HD. This study aims to use different feature selection strategies to produce an accurate ML algorithm for early heart disease prediction. We have chosen features using chi-square, ANOVA, and mutual information methods. The three feature groups chosen were SF-1, SF-2, and SF-3. The study employed ten machine learning algorithms to determine the most accurate technique and feature subset fit. The classification algorithms used include support vector machines (SVM), XGBoost, bagging, decision trees (DT), and random forests (RF). We evaluated the proposed heart disease prediction technique using a private dataset, a public dataset, and different cross-validation methods. We used the Synthetic Minority Oversampling Technique (SMOTE) to eliminate inconsistent data and discover the machine learning algorithm that achieves the most accurate heart disease predictions. Healthcare providers might identify early-stage heart disease quickly and cheaply with the proposed method. We have used the most effective ML algorithm to create a mobile app that instantly predicts heart disease based on the input symptoms. The experimental results demonstrated that the XGBoost algorithm performed optimally when applied to the combined datasets and the SF-2 feature subset. It had 97.57% accuracy, 96.61% sensitivity, 90.48% specificity, 95.00% precision, a 92.68% F1 score, and a 98% AUC. We have developed an explainable AI method based on SHAP approaches to understand how the system makes its final predictions.


Assuntos
Algoritmos , Cardiopatias , Aprendizado de Máquina , Humanos , Cardiopatias/diagnóstico , Máquina de Vetores de Suporte , Inteligência Artificial
5.
Environ Monit Assess ; 196(10): 876, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39222181

RESUMO

Mine water surge is one of the main safety risks in coal mines. This research offers a novel mine water source identification model (BO-CatBoost) to successfully avoid and control mine sudden water catastrophes by properly identifying the sources of mine water. First, the classification model is trained and built using the Categorical Boosting (CatBoost) algorithm. The Gaussian process Bayesian optimization (BO) algorithm is used to optimize parameters, and the optimal parameter combination is integrated into the CatBoost algorithm to build the BO-CatBoost mine water source identification model, which further improves the accuracy of mine water source identification. The model was also applied to the Pingdingshan mine to verify the practicality of the model. Then, 29 groups of unknown water sources in Pingdingshan were selected as validation samples for the model and compared with the conventional CatBoost, Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (Xgboost) models. The comparison results demonstrate that the accuracy of LightGBM, Xgboost, CatBoost, and BO-CatBoost models can reach 69%, 79.3%, 79.3%, and 100% respectively, and the RMSE is 0.947, 0.643, 0.719, and 0.0 respectively. The comprehensive analysis shows that, when it comes to mine water source detection, the BO-CatBoost model performs noticeably better than other models in terms of discriminative accuracy and generalization capacity. Lastly, the multi-output prediction and decision-making process of the BO-CatBoost water source identification model is visualized by the interpretability analysis performed with the SHAP approach. The research demonstrates that the BO-CatBoost model can more precisely and impartially identify mine water sources, offering fresh concepts for mine water source detection.


Assuntos
Teorema de Bayes , Minas de Carvão , Monitoramento Ambiental , Monitoramento Ambiental/métodos , Algoritmos , Mineração , Abastecimento de Água , Modelos Teóricos
6.
Cancer Sci ; 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39223585

RESUMO

This study utilized data from 140,294 prostate cancer cases from the Surveillance, Epidemiology, and End Results (SEER) database. Here, 10 different machine learning algorithms were applied to develop treatment options for predicting patients with prostate cancer, differentiating between surgical and non-surgical treatments. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value. The Shapley Additive Explanations (SHAP) method was employed to investigate the key factors influencing the prediction process. Survival analysis methods were used to compare the survival rates of different treatment options. The CatBoost model yielded the best results (AUC = 0.939, sensitivity = 0.877, accuracy = 0.877). SHAP interpreters revealed that the T stage, cancer stage, age, cores positive percentage, prostate-specific antigen, and Gleason score were the most critical factors in predicting treatment options. The study found that surgery significantly improved survival rates, with patients undergoing surgery experiencing a 20.36% increase in 10-year survival rates compared with those receiving non-surgical treatments. Among surgical options, radical prostatectomy had the highest 10-year survival rate at 89.2%. This study successfully developed a predictive model to guide treatment decisions for prostate cancer. Moreover, the model enhanced the transparency of the decision-making process, providing clinicians with a reference for formulating personalized treatment plans.

7.
Diagnostics (Basel) ; 14(17)2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39272651

RESUMO

Objective: The objective of the study was to establish an AI-driven decision support system by identifying the most important features in the severity of disease for Intensive Care Unit (ICU) with Mechanical Ventilation (MV) requirement, ICU, and InterMediate Care Unit (IMCU) admission for hospitalized patients with COVID-19 in South Florida. The features implicated in the risk factors identified by the model interpretability can be used to forecast treatment plans faster before critical conditions exacerbate. Methods: We analyzed eHR data from 5371 patients diagnosed with COVID-19 from South Florida Memorial Healthcare Systems admitted between March 2020 and January 2021 to predict the need for ICU with MV, ICU, and IMCU admission. A Random Forest classifier was trained on patients' data augmented by SMOTE, collected at hospital admission. We then compared the importance of features utilizing different model interpretability analyses, such as SHAP, MDI, and Permutation Importance. Results: The models for ICU with MV, ICU, and IMCU admission identified the following factors overlapping as the most important predictors among the three outcomes: age, race, sex, BMI, diarrhea, diabetes, hypertension, early stages of kidney disease, and pneumonia. It was observed that individuals over 65 years ('older adults'), males, current smokers, and BMI classified as 'overweight' and 'obese' were at greater risk of severity of illness. The severity was intensified by the co-occurrence of two interacting features (e.g., diarrhea and diabetes). Conclusions: The top features identified by the models' interpretability were from the 'sociodemographic characteristics', 'pre-hospital comorbidities', and 'medications' categories. However, 'pre-hospital comorbidities' played a vital role in different critical conditions. In addition to individual feature importance, the feature interactions also provide crucial information for predicting the most likely outcome of patients' conditions when urgent treatment plans are needed during the surge of patients during the pandemic.

8.
J Inflamm Res ; 17: 5901-5913, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39247840

RESUMO

Background: Machine learning (ML) is increasingly used in medical predictive modeling, but there are no studies applying ML to predict prognosis in Guillain-Barré syndrome (GBS). Materials and Methods: The medical records of 223 patients with GBS were analyzed to construct predictive models that affect patient prognosis. Least Absolute Shrinkage and Selection Operator (LASSO) was used to filter the variables. Decision Trees (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), k-nearest Neighbour (KNN), Naive Bayes (NB), Neural Network (NN). Light Gradient Boosting Machine (LGBM) and Logistic Regression (LR) were used to construct predictive models. Clinical data from 55 GBS patients were used to validate the model. SHapley additive explanation (SHAP) analysis was used to explain the model. Single sample gene set enrichment analysis (ssGSEA) was used for immune cell infiltration analysis. Results: The AUCs (area under the curves) of the 8 ML algorithms including DT, RF, XGBoost, KNN, NB, NN, LGBM and LR were as follows: 0.75, 0.896 0.874, 0.666, 0.742, 0.765, 0.869 and 0.744. The accuracy of XGBoost (0.852) was the highest, followed by LGBM (0.803) and RF (0.758), with F1 index of 0.832, 0.794, and 0.667, respectively. The results of the validation set data analysis showed AUCs of 0.839, 0.919, and 0.733 for RF, XGBoost, and LGBM, respectively. SHAP analysis showed that the SHAP values of blood neutrophil/lymphocyte ratio (NLR), age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus nerve were 0.821, 0.645, 0.517, 0.401 and 0.109, respectively. Conclusion: The combination of NLR, age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus used to predict short-term prognosis in patients with GBS has a good predictive value.

9.
Heliyon ; 10(16): e35871, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39220969

RESUMO

Slope instability through can cause catastrophic consequences, so slope stability analysis has been a key topic in the field of geotechnical engineering. Traditional analysis methods have shortcomings such as high operational difficulty and time-consuming, for this reason many researchers have carried out slope stability analysis based on AI. However, the current relevant studies only judged the importance of each factor and did not specifically quantify the correlation between factors and slope stability. For this purpose, this paper carried out a sensitivity analysis based on the XGBoost and SHAP. The sensitivity analysis results of SHAP were also validated using GeoStudio software. The selected influence factors included slope height ( H ), slope angle ( ß ), unit weight ( γ ), cohesion ( c ), angle of internal friction ( φ ) and pore water pressure coefficient ( r u ). The results showed that c and γ were the most and least important influential parameters, respectively. GeoStudio simulation results showed a negative correlation between γ , ß , H , r u and slope stability, while a positive correlation between c , φ and slope stability. However, for real data, SHAP misjudged the correlation between γ and slope stability. Because current AI lacked common sense knowledge and, leading SHAP unable to effectively explain the real mechanism of slope instability. For this reason, this paper overcame this challenge based on the priori data-driven approach. The method provided more reliable and accurate interpretation of the results than a real sample, especially with limited or low-quality data. In addition, the results of this method showed that the critical values of c , φ , ß , H , and r u in slope destabilization are 18 Kpa, 28°, 32°, 30 m, and 0.28, respectively. These results were closer to GeoStudio simulations than real samples.

10.
Animals (Basel) ; 14(18)2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-39335314

RESUMO

Heat stress poses a significant challenge to livestock farming, particularly affecting the health and productivity of high-yield dairy cows. This study develops a machine learning framework aimed at predicting the core body temperature (CBT) of dairy cows to enable more effective heat stress management and enhance animal welfare. The dataset includes 3005 records of physiological data from real-world production environments, encompassing environmental parameters, individual animal characteristics, and infrared temperature measurements. Employed machine learning algorithms include elastic net (EN), artificial neural networks (ANN), random forests (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and CatBoost, alongside several optimization algorithms such as Bayesian optimization (BO) and grey wolf optimizer (GWO) to refine model performance through hyperparameter tuning. Comparative analysis of various feature sets reveals that the feature set incorporating the average infrared temperature of the trunk (IRTave_TK) excels in CBT prediction, achieving a coefficient of determination (R2) value of 0.516, mean absolute error (MAE) of 0.239 °C, and root mean square error (RMSE) of 0.302 °C. Further analysis shows that the GWO-XGBoost model surpasses others in predictive accuracy with an R2 value of 0.540, RMSE as low as 0.294 °C, and MAE of just 0.232 °C, and leads in computational efficiency with an optimization time of merely 2.41 s-approximately 4500 times faster than the highest accuracy model. Through SHAP (SHapley Additive exPlanations) analysis, IRTave_TK, time zone (TZ), days in lactation (DOL), and body posture (BP) are identified as the four most critical factors in predicting CBT, and the interaction effects of IRTave_TK with other features such as body posture and time periods are unveiled. This study provides technological support for livestock management, facilitating the development and optimization of predictive models to implement timely and effective interventions, thereby maintaining the health and productivity of dairy cows.

11.
Biomedicines ; 12(9)2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39335469

RESUMO

BACKGROUND: Colorectal Polyps are the main source of precancerous lesions in colorectal cancer. To increase the early diagnosis of tumors and improve their screening, we aimed to develop a simple and non-invasive diagnostic prediction model for colorectal polyps based on machine learning (ML) and using accessible health examination records. METHODS: We conducted a single-center observational retrospective study in China. The derivation cohort, consisting of 5426 individuals who underwent colonoscopy screening from January 2021 to January 2024, was separated for training (cohort 1) and validation (cohort 2). The variables considered in this study included demographic data, vital signs, and laboratory results recorded by health examination records. With features selected by univariate analysis and Lasso regression analysis, nine machine learning methods were utilized to develop a colorectal polyp diagnostic model. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley additive explanation method (SHAP) was used to rank the feature importance and explain the final model. RESULTS: 14 independent predictors were identified as the most valuable features to establish the models. The adaptive boosting machine (AdaBoost) model exhibited the best performance among the 9 ML models in cohort 1, with accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) of 0.632 (0.618-0.646), 0.635 (0.550-0.721), 0.674 (0.591-0.758), 0.593 (0.576-0.611), 0.673 (0.654-0.691), 0.608 (0.560-0.655) and 0.687 (0.626-0.749), respectively. The final model gave an AUC of 0.675 in cohort 2. Additionally, the precision recall (PR) curve for the AdaBoost model reached the highest AUPR of 0.648, positioning it nearest to the upper right corner. SHAP analysis provided visualized explanations, reaffirming the critical factors associated with the risk of colorectal polyps in the asymptomatic population. CONCLUSIONS: This study integrated the clinical and laboratory indicators with machine learning techniques to establish the predictive model for colorectal polyps, providing non-invasive, cost-effective screening strategies for asymptomatic individuals and guiding decisions for further examination and treatment.

12.
Nutrients ; 16(18)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39339819

RESUMO

BACKGROUND: Diarrheal disease remains a significant public health issue, particularly affecting young children and older adults. Despite efforts to control and prevent these diseases, their incidence continues to be a global concern. Understanding the trends in diarrhea incidence and the factors influencing these trends is crucial for developing effective public health strategies. OBJECTIVE: This study aimed to explore the temporal trends in diarrhea incidence and associated factors from 1990 to 2019 and to project the incidence for the period 2020-2040 at global, regional, and national levels. We aimed to identify key factors influencing these trends to inform future prevention and control strategies. METHODS: The eXtreme Gradient Boosting (XGBoost) model was used to predict the incidence from 2020 to 2040 based on demographic, meteorological, water sanitation, and sanitation and hygiene indicators. SHapley Additive exPlanations (SHAP) value was performed to explain the impact of variables in the model on the incidence. Estimated annual percentage change (EAPC) was calculated to assess the temporal trends of age-standardized incidence rates (ASIRs) from 1990 to 2019 and from 2020 to 2040. RESULTS: Globally, both incident cases and ASIRs of diarrhea increased between 2010 and 2019. The incident cases are expected to rise from 2020 to 2040, while the ASIRs and incidence rates are predicted to slightly decrease. During the observed (1990-2019) and predicted (2020-2040) periods, adults aged 60 years and above exhibited an upward trend in incidence rate as age increased, while children aged < 5 years consistently had the highest incident cases. The SHAP framework was applied to explain the model predictions. We identified several risk factors associated with an increased incidence of diarrhea, including age over 60 years, yearly precipitation exceeding 3000 mm, temperature above 20 °C for both maximum and minimum values, and vapor pressure deficit over 1500 Pa. A decreased incidence rate was associated with relative humidity over 60%, wind speed over 4 m/s, and populations with above 80% using safely managed drinking water services and over 40% using safely managed sanitation services. CONCLUSIONS: Diarrheal diseases are still serious public health concerns, with predicted increases in the incident cases despite decreasing ASIRs globally. Children aged < 5 years remain highly susceptible to diarrheal diseases, yet the incidence rate in the older adults aged 60 plus years still warrants additional attention. Additionally, more targeted efforts to improve access to safe drinking water and sanitation services are crucial for reducing the incidence of diarrheal diseases globally.


Assuntos
Diarreia , Saúde Global , Humanos , Incidência , Diarreia/epidemiologia , Saúde Global/estatística & dados numéricos , Saneamento , Higiene , Previsões , Fatores de Risco , Pré-Escolar , Feminino , Masculino
13.
J Environ Manage ; 370: 122640, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39340889

RESUMO

Soil salinization is a critical global issue for sustainable agriculture, impacting crop yields and posing a threat to achieving the Sustainable Development Goal (SDG) of ensuring food security. It is necessary to monitor it in detail and uncover its underlying factors at a regional scale. In this context, the present study aimed to evaluate soil health in the eastern Mediterranean region by using the Sodium Adsorption Ratio (SAR) as an indicator of soil salinity in three distinct soil horizons. The main objective of the research was to evaluate the performance of four machine learning (ML) models, including Random Forest (RF), Nu Support Vector Regression (NuSVR), Artificial Neural Network-Multi Layer Perceptron (ANN-MLP), and Gradient Boosting Regression (GBR), for accurate prediction of SAR following the Recursive Feature Elimination (RFE) as a feature selection method. Moreover, SHapely Additive exPlanations (SHAP) was applied as sensitivity analysis to identify the most influential covariates. Main findings of the research revealed that the average clay content in the surface horizon (H10-25cm) was 50.5% ± 10.4, which significantly increased to 57.5% ± 8.7 (p < 0.05). No significant mean differences were detected between the studied horizons for SAR and Na+. ML output revealed that NuSVR outperformed other algorithms in accurately predicting outcomes during both the training and testing stages. Moreover, Scenario 2 (SC2) with seven selected features from the RFE method facilitated highly accurate SAR predictions. Overall, the performance of ML models is ranked as NuSVR > GBR > ANN-MLP > RF. Lastly, SHAP sensitivity analysis identified CEC, Ca+2, Mg+2, and Na+ as the most influential variables for SAR prediction in both the training and testing stages. Hence, the research yielded valuable insights for efficient agricultural soil management at a regional level using state-of-the-art technology.

14.
Eur J Surg Oncol ; 50(12): 108703, 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39326305

RESUMO

BACKGROUND: Unplanned reoperation (URO) after surgery adversely affects the quality of life and prognosis of patients undergoing anterior resection for rectal cancer. This study aims to meet the urgent need for reliable predictive tools by developing an optimized machine learning model to estimate the risk of URO following anterior resection in rectal cancer patients. METHODS: This retrospective study collected multidimensional data from patients who underwent anterior resection for rectal cancer at Tongji Hospital of Huazhong University of Science and Technology from January 2012 to December 2022. Feature selection was conducted using both least absolute shrinkage and selection operator (LASSO) regression and the Boruta algorithm. Multiple machine learning models were developed, with parameter optimization via grid search and cross-validation. Performance metrics included accuracy, specificity, sensitivity, and area under curve (AUC). The optimal model was interpreted using SHapley Additive exPlanations (SHAP), and an online platform was created for real-time risk prediction. RESULTS: A total of 2384 patients who underwent anterior resection for rectal cancer were included in this study. Following rigorous selection, 14 variables were identified for constructing the machine learning model. The optimized model demonstrated high predictive accuracy, with the random forest (RF) model achieving the best overall performance. The model achieved an AUC of 0.889 and an accuracy of 0.842 on the test dataset. SHAP analysis revealed that the tumor location, previous abdominal surgery, and operative time were the most significant factors influencing the risk of URO. CONCLUSION: This study developed an optimized machine learning-based online predictive system to assess the risk of URO after anterior resection in rectal cancer patients. Accessible at https://yangsu2023.shinyapps.io/UROrisk/, this system improves prediction accuracy and offers real-time risk assessment, providing a valuable tool that may support clinical decision-making and potentially improve the prognosis of rectal cancer patients.

15.
JMIR Public Health Surveill ; 10: e48705, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39264706

RESUMO

BACKGROUND: Understanding the factors contributing to mental well-being in youth is a public health priority. Self-reported enthusiasm for the future may be a useful indicator of well-being and has been shown to forecast social and educational success. Typically, cross-domain measures of ecological and health-related factors with relevance to public policy and programming are analyzed either in isolation or in targeted models assessing bivariate interactions. Here, we capitalize on a large provincial data set and machine learning to identify the sociodemographic, experiential, behavioral, and other health-related factors most strongly associated with levels of subjective enthusiasm for the future in a large sample of elementary and secondary school students. OBJECTIVE: The aim of this study was to identify the sociodemographic, experiential, behavioral, and other health-related factors associated with enthusiasm for the future in elementary and secondary school students using machine learning. METHODS: We analyzed data from 13,661 participants in the 2019 Ontario Student Drug Use and Health Survey (OSDUHS) (grades 7-12) with complete data for our primary outcome: self-reported levels of enthusiasm for the future. We used 50 variables as model predictors, including demographics, perception of school experience (i.e., school connectedness and academic performance), physical activity and quantity of sleep, substance use, and physical and mental health indicators. Models were built using a nonlinear decision tree-based machine learning algorithm called extreme gradient boosting to classify students as indicating either high or low levels of enthusiasm. Shapley additive explanations (SHAP) values were used to interpret the generated models, providing a ranking of feature importance and revealing any nonlinear or interactive effects of the input variables. RESULTS: The top 3 contributors to higher self-rated enthusiasm for the future were higher self-rated physical health (SHAP value=0.62), feeling that one is able to discuss problems or feelings with their parents (SHAP value=0.49), and school belonging (SHAP value=0.32). Additionally, subjective social status at school was a top feature and showed nonlinear effects, with benefits to predicted enthusiasm present in the mid-to-high range of values. CONCLUSIONS: Using machine learning, we identified key factors related to self-reported enthusiasm for the future in a large sample of young students: perceived physical health, subjective school social status and connectedness, and quality of relationship with parents. A focus on perceptions of physical health and school connectedness should be considered central to improving the well-being of youth at the population level.


Assuntos
Aprendizado de Máquina , Estudantes , Humanos , Adolescente , Masculino , Estudos Transversais , Feminino , Estudantes/psicologia , Estudantes/estatística & dados numéricos , Criança , Ontário , Instituições Acadêmicas , Autorrelato
16.
Biomimetics (Basel) ; 9(9)2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39329567

RESUMO

The performance of ultra-high-performance concrete (UHPC) allows for the design and creation of thinner elements with superior overall durability. The compressive strength of UHPC is a value that can be reached after a certain period of time through a series of tests and cures. However, this value can be estimated by machine-learning methods. In this study, multilayer perceptron (MLP) and Stacking Regressor, an ensemble machine-learning models, is used to predict the compressive strength of high-performance concrete. Then, the ML model's performance is explained with a feature importance analysis and Shapley additive explanations (SHAPs), and the developed models are interpreted. The effect of using different random splits for the training and test sets has been investigated. It was observed that the stacking regressor, which combined the outputs of Extreme Gradient Boosting (XGBoost), Category Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), and Extra Trees regressors using random forest as the final estimator, performed significantly better than the MLP regressor. It was shown that the compressive strength was predicted by the stacking regressor with an average R2 score of 0.971 on the test set. On the other hand, the average R2 score of the MLP model was 0.909. The results of the SHAP analysis showed that the age of concrete and the amounts of silica fume, fiber, superplasticizer, cement, aggregate, and water have the greatest impact on the model predictions.

17.
Methods ; 231: 144-153, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39326482

RESUMO

In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method. Availability: The proposed method can be freely accessible at https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP. Data will be made available on request.

18.
Sci Total Environ ; 954: 176605, 2024 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-39349201

RESUMO

This study assessed the levels of soil heavy metal pollution in agricultural land in southeastern Chengdu and its effects on the germination stage of higher plants. Through extensive soil sampling and laboratory analyses, 15 soil environmental factors were measured, including soil density, porosity, pH, field moisture capacity (FMC), calcium carbonate (CaCO3), and heavy metals such as arsenic (As) and cadmium (Cd). Acute toxicity tests were performed on sorghum (Sorghum bicolor) and Brassica napus (Brassica napus var. napus). The results of the geo-accumulation index (Igeo) and enrichment factor (EF) analyses indicate a higher risk of pollution and enrichment of As and Cd in the study area, with relatively lower risks for other heavy metals. Additionally, the current soil heavy metal concentrations inhibited the growth of sorghum and Brassica napus shoots and roots during the germination stage. Redundancy analysis (RDA), factor detector, and XGBoost-SHAP models identified the As, Cd, FMC, and CaCO3 contents, soil density, and porosity as the primary factors influencing plant growth. Among these factors, FMC, porosity, and Cd were found to promote plant growth, whereas soil density and As demonstrated inhibitory effects. CaCO3 had a dual effect, initially promoting growth but later inhibiting it as its concentration increased. Further analysis revealed that Brassica napus is more sensitive to soil environmental factors than sorghum, particularly to Cd and As, while sorghum has greater tolerance. Moreover, roots were found to be more sensitive than shoots to soil environmental factors, with roots being influenced primarily by physical factors such as FMC and soil density, whereas shoots were affected primarily by chemical factors such as As and Cd. This study addresses the significant lack of data regarding the impact of soil heavy metal concentrations on plant growth in southeastern Chengdu, providing a scientific basis for regional environmental monitoring, soil remediation, and plant cultivation optimization.

19.
J Food Sci ; 89(10): 6553-6574, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39218808

RESUMO

Brown rice over-milling causes high economic and nutrient loss. The rice degree of milling (DOM) detection and prediction remain a challenge for moderate processing. In this study, a self-established grain image acquisition platform was built. Degree of bran layer remaining (DOR) datasets is established with image capturing and processing (grain color, texture, and shape features extraction). The mapping relationship between DOR and the DOM is in-depth analyzed. Rice grain DOR typical machine learning and deep learning prediction models are established. The results indicate that the optimized Catboost model can be established with cross-validation and grid search method, with the best accuracy improving from 84.28% to 91.24%, achieving precision 91.31%, recall 90.89%, and F1-score 91.07%. Shapley additive explanations analysis indicates that color, texture, and shape feature affect Catboost prediction accuracy, the feature importance: color > texture > shape. The YCbCr-Cb_ske and GLCM-Contrast features make the most significant contribution to rice milling quality prediction. The feature importance provides theoretical and practical guidance for grain DOM prediction model. PRACTICAL APPLICATION: Rice milling degree prediction and detection are valuable for rice milling process in practical application. In this paper, image processing and machine learning methods provide an automated, nondestructive, and cost-effective way to predict the quality of rice. The study may serve as a valuable reference for improving rice milling methods, retaining rice nutrition, and reducing broken rice yield.


Assuntos
Manipulação de Alimentos , Aprendizado de Máquina , Oryza , Oryza/química , Manipulação de Alimentos/métodos , China , Grão Comestível/química , Processamento de Imagem Assistida por Computador/métodos , Cor , População do Leste Asiático
20.
Polymers (Basel) ; 16(18)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39339143

RESUMO

Three-dimensional printing technology is a rapid prototyping technology that has been widely used in manufacturing. However, the printing parameters in the 3D printing process have an important impact on the printing effect, so these parameters need to be optimized to obtain the best printing effect. In order to further understand the impact of 3D printing parameters on the printing effect, make theoretical explanations from the dimensions of mathematical models, and clarify the rationality of certain important parameters in previous experience, the purpose of this study is to predict the impact of 3D printing parameters on the printing effect by using machine learning methods. Specifically, we used four machine learning algorithms: SVR (support vector regression): A regression method that uses the principle of structural risk minimization to find a hyperplane in a high-dimensional space that best fits the data, with the goal of minimizing the generalization error bound. Random forest: An ensemble learning method that constructs a multitude of decision trees and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. GBDT (gradient boosting decision tree): An iterative ensemble technique that combines multiple weak prediction models (decision trees) into a strong one by sequentially minimizing the loss function. Each subsequent tree is built to correct the errors of the previous tree. XGB (extreme gradient boosting): An optimized and efficient implementation of gradient boosting that incorporates various techniques to improve the performance of gradient boosting frameworks, such as regularization and sparsity-aware splitting algorithms. The influence of the print parameters on the results under the feature importance and SHAP (Shapley additive explanation) values is compared to determine which parameters have the greatest impact on the print effect. We also used feature importance and SHAP values to compare the importance impact of print parameters on results. In the experiment, we used a dataset with multiple parameters and divided it into a training set and a test set. Through Bayesian optimization and grid search, we determined the best hyperparameters for each algorithm and used the best model to make predictions for the test set. We compare the predictive performance of each model and confirm that the extrusion expansion ratio, elastic modulus, and elongation at break have the greatest influence on the printing effect, which is consistent with the experience. In future, we will continue to delve into methods for optimizing 3D printing parameters and explore how interpretive machine learning can be applied to the 3D printing process to achieve more efficient and reliable printing results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA