Búsqueda | Portal Regional de la BVS

1.

Optimized Machine Learning Models for Predicting Core Body Temperature in Dairy Cows: Enhancing Accuracy and Interpretability for Practical Livestock Management.

Li, Dapeng; Yan, Geqi; Li, Fuwei; Lin, Hai; Jiao, Hongchao; Han, Haixia; Liu, Wei.

Animals (Basel) ; 14(18)2024 Sep 20.

Artículo en Inglés | MEDLINE | ID: mdl-39335314

RESUMEN

Heat stress poses a significant challenge to livestock farming, particularly affecting the health and productivity of high-yield dairy cows. This study develops a machine learning framework aimed at predicting the core body temperature (CBT) of dairy cows to enable more effective heat stress management and enhance animal welfare. The dataset includes 3005 records of physiological data from real-world production environments, encompassing environmental parameters, individual animal characteristics, and infrared temperature measurements. Employed machine learning algorithms include elastic net (EN), artificial neural networks (ANN), random forests (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and CatBoost, alongside several optimization algorithms such as Bayesian optimization (BO) and grey wolf optimizer (GWO) to refine model performance through hyperparameter tuning. Comparative analysis of various feature sets reveals that the feature set incorporating the average infrared temperature of the trunk (IRTave_TK) excels in CBT prediction, achieving a coefficient of determination (R2) value of 0.516, mean absolute error (MAE) of 0.239 °C, and root mean square error (RMSE) of 0.302 °C. Further analysis shows that the GWO-XGBoost model surpasses others in predictive accuracy with an R2 value of 0.540, RMSE as low as 0.294 °C, and MAE of just 0.232 °C, and leads in computational efficiency with an optimization time of merely 2.41 s-approximately 4500 times faster than the highest accuracy model. Through SHAP (SHapley Additive exPlanations) analysis, IRTave_TK, time zone (TZ), days in lactation (DOL), and body posture (BP) are identified as the four most critical factors in predicting CBT, and the interaction effects of IRTave_TK with other features such as body posture and time periods are unveiled. This study provides technological support for livestock management, facilitating the development and optimization of predictive models to implement timely and effective interventions, thereby maintaining the health and productivity of dairy cows.

2.

Interpretable Machine Learning-Based Influence Factor Identification for 3D Printing Process-Structure Linkages.

Liu, Fuguo; Chen, Ziru; Xu, Jun; Zheng, Yanyan; Su, Wenyi; Tian, Maozai; Li, Guodong.

Polymers (Basel) ; 16(18)2024 Sep 23.

Artículo en Inglés | MEDLINE | ID: mdl-39339143

RESUMEN

Three-dimensional printing technology is a rapid prototyping technology that has been widely used in manufacturing. However, the printing parameters in the 3D printing process have an important impact on the printing effect, so these parameters need to be optimized to obtain the best printing effect. In order to further understand the impact of 3D printing parameters on the printing effect, make theoretical explanations from the dimensions of mathematical models, and clarify the rationality of certain important parameters in previous experience, the purpose of this study is to predict the impact of 3D printing parameters on the printing effect by using machine learning methods. Specifically, we used four machine learning algorithms: SVR (support vector regression): A regression method that uses the principle of structural risk minimization to find a hyperplane in a high-dimensional space that best fits the data, with the goal of minimizing the generalization error bound. Random forest: An ensemble learning method that constructs a multitude of decision trees and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. GBDT (gradient boosting decision tree): An iterative ensemble technique that combines multiple weak prediction models (decision trees) into a strong one by sequentially minimizing the loss function. Each subsequent tree is built to correct the errors of the previous tree. XGB (extreme gradient boosting): An optimized and efficient implementation of gradient boosting that incorporates various techniques to improve the performance of gradient boosting frameworks, such as regularization and sparsity-aware splitting algorithms. The influence of the print parameters on the results under the feature importance and SHAP (Shapley additive explanation) values is compared to determine which parameters have the greatest impact on the print effect. We also used feature importance and SHAP values to compare the importance impact of print parameters on results. In the experiment, we used a dataset with multiple parameters and divided it into a training set and a test set. Through Bayesian optimization and grid search, we determined the best hyperparameters for each algorithm and used the best model to make predictions for the test set. We compare the predictive performance of each model and confirm that the extrusion expansion ratio, elastic modulus, and elongation at break have the greatest influence on the printing effect, which is consistent with the experience. In future, we will continue to delve into methods for optimizing 3D printing parameters and explore how interpretive machine learning can be applied to the 3D printing process to achieve more efficient and reliable printing results.

3.

PyCaret for Predicting Type 2 Diabetes: A Phenotype- and Gender-Based Approach with the "Nurses' Health Study" and the "Health Professionals' Follow-Up Study" Datasets.

Gul, Sebnem; Ayturan, Kubilay; Hardalaç, Firat.

J Pers Med ; 14(8)2024 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-39201996

RESUMEN

Predicting type 2 diabetes mellitus (T2DM) by using phenotypic data with machine learning (ML) techniques has received significant attention in recent years. PyCaret, a low-code automated ML tool that enables the simultaneous application of 16 different algorithms, was used to predict T2DM by using phenotypic variables from the "Nurses' Health Study" and "Health Professionals' Follow-up Study" datasets. Ridge Classifier, Linear Discriminant Analysis, and Logistic Regression (LR) were the best-performing models for the male-only data subset. For the female-only data subset, LR, Gradient Boosting Classifier, and CatBoost Classifier were the strongest models. The AUC, accuracy, and precision were approximately 0.77, 0.70, and 0.70 for males and 0.79, 0.70, and 0.71 for females, respectively. The feature importance plot showed that family history of diabetes (famdb), never having smoked, and high blood pressure (hbp) were the most influential features in females, while famdb, hbp, and currently being a smoker were the major variables in males. In conclusion, PyCaret was used successfully for the prediction of T2DM by simplifying complex ML tasks. Gender differences are important to consider for T2DM prediction. Despite this comprehensive ML tool, phenotypic variables alone may not be sufficient for early T2DM prediction; genotypic variables could also be used in combination for future studies.

4.

Decoding micro-electrocorticographic signals by using explainable 3D convolutional neural network to predict finger movements.

Kuo, Chao-Hung; Liu, Guan-Tze; Lee, Chi-En; Wu, Jing; Casimo, Kaitlyn; Weaver, Kurt E; Lo, Yu-Chun; Chen, You-Yin; Huang, Wen-Cheng; Ojemann, Jeffrey G.

J Neurosci Methods ; 411: 110251, 2024 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-39151656

RESUMEN

BACKGROUND: Electroencephalography (EEG) and electrocorticography (ECoG) recordings have been used to decode finger movements by analyzing brain activity. Traditional methods focused on single bandpass power changes for movement decoding, utilizing machine learning models requiring manual feature extraction. NEW METHOD: This study introduces a 3D convolutional neural network (3D-CNN) model to decode finger movements using ECoG data. The model employs adaptive, explainable AI (xAI) techniques to interpret the physiological relevance of brain signals. ECoG signals from epilepsy patients during awake craniotomy were processed to extract power spectral density across multiple frequency bands. These data formed a 3D matrix used to train the 3D-CNN to predict finger trajectories. RESULTS: The 3D-CNN model showed significant accuracy in predicting finger movements, with root-mean-square error (RMSE) values of 0.26-0.38 for single finger movements and 0.20-0.24 for combined movements. Explainable AI techniques, Grad-CAM and SHAP, identified the high gamma (HG) band as crucial for movement prediction, showing specific cortical regions involved in different finger movements. These findings highlighted the physiological significance of the HG band in motor control. COMPARISON WITH EXISTING METHODS: The 3D-CNN model outperformed traditional machine learning approaches by effectively capturing spatial and temporal patterns in ECoG data. The use of xAI techniques provided clearer insights into the model's decision-making process, unlike the "black box" nature of standard deep learning models. CONCLUSIONS: The proposed 3D-CNN model, combined with xAI methods, enhances the decoding accuracy of finger movements from ECoG data. This approach offers a more efficient and interpretable solution for brain-computer interface (BCI) applications, emphasizing the HG band's role in motor control.

Asunto(s)

Electrocorticografía , Dedos , Movimiento , Redes Neurales de la Computación , Humanos , Dedos/fisiología , Electrocorticografía/métodos , Movimiento/fisiología , Adulto , Masculino , Femenino , Epilepsia/fisiopatología , Adulto Joven , Aprendizaje Automático , Procesamiento de Señales Asistido por Computador

5.

Tree-based ensemble machine learning models in the prediction of acute respiratory distress syndrome following cardiac surgery: a multicenter cohort study.

Zhang, Hang; Qian, Dewei; Zhang, Xiaomiao; Meng, Peize; Huang, Weiran; Gu, Tongtong; Fan, Yongliang; Zhang, Yi; Wang, Yuchen; Yu, Min; Yuan, Zhongxiang; Chen, Xin; Zhao, Qingnan; Ruan, Zheng.

J Transl Med ; 22(1): 772, 2024 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-39148090

RESUMEN

BACKGROUND: Acute respiratory distress syndrome (ARDS) after cardiac surgery is a severe respiratory complication with high mortality and morbidity. Traditional clinical approaches may lead to under recognition of this heterogeneous syndrome, potentially resulting in diagnosis delay. This study aims to develop and external validate seven machine learning (ML) models, trained on electronic health records data, for predicting ARDS after cardiac surgery. METHODS: This multicenter, observational cohort study included patients who underwent cardiac surgery in the training and testing cohorts (data from Nanjing First Hospital), as well as those patients who had cardiac surgery in a validation cohort (data from Shanghai General Hospital). The number of important features was determined using the sliding windows sequential forward feature selection method (SWSFS). We developed a set of tree-based ML models, including Decision Tree, GBDT, AdaBoost, XGBoost, LightGBM, Random Forest, and Deep Forest. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and Brier score. The SHapley Additive exPlanation (SHAP) techinque was employed to interpret the ML model. Furthermore, a comparison was made between the ML models and traditional scoring systems. ARDS is defined according to the Berlin definition. RESULTS: A total of 1996 patients who had cardiac surgery were included in the study. The top five important features identified by the SWSFS were chronic obstructive pulmonary disease, preoperative albumin, central venous pressure_T4, cardiopulmonary bypass time, and left ventricular ejection fraction. Among the seven ML models, Deep Forest demonstrated the best performance, with an AUC of 0.882 and a Brier score of 0.809 in the validation cohort. Notably, the SHAP values effectively illustrated the contribution of the 13 features attributed to the model output and the individual feature's effect on model prediction. In addition, the ensemble ML models demonstrated better performance than the other six traditional scoring systems. CONCLUSIONS: Our study identified 13 important features and provided multiple ML models to enhance the risk stratification for ARDS after cardiac surgery. Using these predictors and ML models might provide a basis for early diagnostic and preventive strategies in the perioperative management of ARDS patients.

Asunto(s)

Procedimientos Quirúrgicos Cardíacos , Aprendizaje Automático , Síndrome de Dificultad Respiratoria , Humanos , Síndrome de Dificultad Respiratoria/etiología , Masculino , Femenino , Persona de Mediana Edad , Estudios de Cohortes , Procedimientos Quirúrgicos Cardíacos/efectos adversos , Anciano , Curva ROC , Área Bajo la Curva

6.

Factors affecting biochemical pregnancy loss (BPL) in preimplantation genetic testing for aneuploidy (PGT-A) cycles: machine learning-assisted identification.

Ortiz, José A; Lledó, B; Morales, R; Máñez-Grau, A; Cascales, A; Rodríguez-Arnedo, A; Castillo, Juan C; Bernabeu, A; Bernabeu, R.

Reprod Biol Endocrinol ; 22(1): 101, 2024 Aug 08.

Artículo en Inglés | MEDLINE | ID: mdl-39118049

RESUMEN

PURPOSE: To determine the factors influencing the likelihood of biochemical pregnancy loss (BPL) after transfer of a euploid embryo from preimplantation genetic testing for aneuploidy (PGT-A) cycles. METHODS: The study employed an observational, retrospective cohort design, encompassing 6020 embryos from 2879 PGT-A cycles conducted between February 2013 and September 2021. Trophectoderm biopsies in day 5 (D5) or day 6 (D6) blastocysts were analyzed by next generation sequencing (NGS). Only single embryo transfers (SET) were considered, totaling 1161 transfers. Of these, 49.9% resulted in positive pregnancy tests, with 18.3% experiencing BPL. To establish a predictive model for BPL, both classical statistical methods and five different supervised classification machine learning algorithms were used. A total of forty-seven factors were incorporated as predictor variables in the machine learning models. RESULTS: Throughout the optimization process for each model, various performance metrics were computed. Random Forest model emerged as the best model, boasting the highest area under the ROC curve (AUC) value of 0.913, alongside an accuracy of 0.830, positive predictive value of 0.857, and negative predictive value of 0.807. For the selected model, SHAP (SHapley Additive exPlanations) values were determined for each of the variables to establish which had the best predictive ability. Notably, variables pertaining to embryo biopsy demonstrated the greatest predictive capacity, followed by factors associated with ovarian stimulation (COS), maternal age, and paternal age. CONCLUSIONS: The Random Forest model had a higher predictive power for identifying BPL occurrences in PGT-A cycles. Specifically, variables associated with the embryo biopsy procedure (biopsy day, number of biopsied embryos, and number of biopsied cells) and ovarian stimulation (number of oocytes retrieved and duration of stimulation), exhibited the strongest predictive power.

Asunto(s)

Aborto Espontáneo , Aneuploidia , Pruebas Genéticas , Aprendizaje Automático , Diagnóstico Preimplantación , Humanos , Femenino , Embarazo , Diagnóstico Preimplantación/métodos , Estudios Retrospectivos , Adulto , Pruebas Genéticas/métodos , Aborto Espontáneo/diagnóstico , Aborto Espontáneo/genética , Aborto Espontáneo/epidemiología , Transferencia de Embrión/métodos , Blastocisto

7.

Machine Learning-Based Prediction of Suicidal Thinking in Adolescents by Derivation and Validation in 3 Independent Worldwide Cohorts: Algorithm Development and Validation Study.

Kim, Hyejun; Son, Yejun; Lee, Hojae; Kang, Jiseung; Hammoodi, Ahmed; Choi, Yujin; Kim, Hyeon Jin; Lee, Hayeon; Fond, Guillaume; Boyer, Laurent; Kwon, Rosie; Woo, Selin; Yon, Dong Keon.

J Med Internet Res ; 26: e55913, 2024 May 17.

Artículo en Inglés | MEDLINE | ID: mdl-38758578

RESUMEN

BACKGROUND: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods. OBJECTIVE: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML). METHODS: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway's University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking. RESULTS: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each. CONCLUSIONS: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence.

Asunto(s)

Aprendizaje Automático , Ideación Suicida , Humanos , Adolescente , Femenino , Masculino , República de Corea , Algoritmos , Estudios de Cohortes , Conducta del Adolescente/psicología , Suicidio/estadística & datos numéricos , Suicidio/psicología , Noruega , Encuestas y Cuestionarios , Factores de Riesgo , Asunción de Riesgos

8.

Explainable and visualizable machine learning models to predict biochemical recurrence of prostate cancer.

Lu, Wenhao; Zhao, Lin; Wang, Shenfan; Zhang, Huiyong; Jiang, Kangxian; Ji, Jin; Chen, Shaohua; Wang, Chengbang; Wei, Chunmeng; Zhou, Rongbin; Wang, Zuheng; Li, Xiao; Wang, Fubo; Wei, Xuedong; Hou, Wenlei.

Clin Transl Oncol ; 26(9): 2369-2379, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-38602643

RESUMEN

PURPOSE: Machine learning (ML) models presented an excellent performance in the prognosis prediction. However, the black box characteristic of ML models limited the clinical applications. Here, we aimed to establish explainable and visualizable ML models to predict biochemical recurrence (BCR) of prostate cancer (PCa). MATERIALS AND METHODS: A total of 647 PCa patients were retrospectively evaluated. Clinical parameters were identified using LASSO regression. Then, cohort was split into training and validation datasets with a ratio of 0.75:0.25 and BCR-related features were included in Cox regression and five ML algorithm to construct BCR prediction models. The clinical utility of each model was evaluated by concordance index (C-index) values and decision curve analyses (DCA). Besides, Shapley Additive Explanation (SHAP) values were used to explain the features in the models. RESULTS: We identified 11 BCR-related features using LASSO regression, then establishing five ML-based models, including random survival forest (RSF), survival support vector machine (SSVM), survival Tree (sTree), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and a Cox regression model, C-index were 0.846 (95%CI 0.796-0.894), 0.774 (95%CI 0.712-0.834), 0.757 (95%CI 0.694-0.818), 0.820 (95%CI 0.765-0.869), 0.793 (95%CI 0.735-0.852), and 0.807 (95%CI 0.753-0.858), respectively. The DCA showed that RSF model had significant advantages over all models. In interpretability of ML models, the SHAP value demonstrated the tangible contribution of each feature in RSF model. CONCLUSIONS: Our score system provide reference for the identification for BCR, and the crafting of a framework for making therapeutic decisions for PCa on a personalized basis.

Asunto(s)

Aprendizaje Automático , Recurrencia Local de Neoplasia , Neoplasias de la Próstata , Humanos , Masculino , Neoplasias de la Próstata/sangre , Neoplasias de la Próstata/patología , Recurrencia Local de Neoplasia/sangre , Recurrencia Local de Neoplasia/patología , Estudios Retrospectivos , Anciano , Persona de Mediana Edad , Pronóstico , Árboles de Decisión , Modelos de Riesgos Proporcionales , Algoritmos , Máquina de Vectores de Soporte , Antígeno Prostático Específico/sangre

9.

[Ozone Sensitivity Analysis in Urban Beijing Based on Random Forest].

Zhou, Hong; Wang, Ming; Chai, Wen-Xuan; Zhao, Xin.

Huan Jing Ke Xue ; 45(5): 2497-2506, 2024 May 08.

Artículo en Chino | MEDLINE | ID: mdl-38629515

RESUMEN

The basis and key step to developing ozone (O3) prevention and control measures is determining the non-linear relationship between O3 and its precursors. Based on online observations of O3, volatile organic compounds (VOCs), nitrogen oxides (NOx), and meteorological elements from April to September 2020 at an urban site in Beijing, we analyzed the pollution characteristics of O3 and its precursors, explored key factors affecting O3 using the random forest (RF) model combined with SHAP values, and explored the O3-VOCs-NOx sensitivity through a multi-scenarios analysis. The results of correlation analysis showed that the hourly concentration of O3 was significantly positively correlated with temperature (T) and negatively correlated with TVOCs and NOx. However, in terms of the daily values, O3 was significantly positively correlated with T, TVOCs, and NOx. The simulated O3 values by the RF model agreed with the measured values. The SHAP values of each characteristic variable were further calculated. The results suggested that T and NOx showed the two highest effects on O3, with positive and negative values, respectively. Based on the average NOx and VOCs on O3 pollution days during the observation period (the base scenario), multi-scenarios with different NOx and VOCs were set up. The RF model was used to calculate O3 under different scenarios and obtain the O3 isopleth (EKMA curve). The results showed that the O3-VOCs-NOx sensitivity in urban areas of Beijing was in the VOCs-limited regime, which was consistent with the results obtained from the observation-based box model(OBM). This indicated that the RF model could be used as a complementary method for O3-VOCs-NOx sensitivity analysis.

10.

Quantifying source contributions to ambient NH₃ using Geo-AI with time lag and parcel tracking functions.

Wu, Chih-Da; Zhu, Jun-Jie; Hsu, Chin-Yu; Shie, Ruei-Hao.

Environ Int ; 185: 108520, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38412565

RESUMEN

Ambient ammonia (NH3) plays an important compound in forming particulate matters (PMs), and therefore, it is crucial to comprehend NH3's properties in order to better reduce PMs. However, it is not easy to achieve this goal due to the limited range/real-time NH3 data monitored by the air quality stations. While there were other studies to predict NH3 and its source apportionment, this manuscript provides a novel method (i.e., GEO-AI)) to look into NH3 predictions and their contribution sources. This study represents a pioneering effort in the application of a novel geospatial-artificial intelligence (Geo-AI) base model with parcel tracking functions. This innovative approach seamlessly integrates various machine learning algorithms and geographic predictor variables to estimate NH3 concentrations, marking the first instance of such a comprehensive methodology. The Shapley additive explanation (SHAP) was used to further analyze source contribution of NH3 with domain knowledge. From 2016 to 2018, Taichung's hourly average NH3 values were predicted with total variance up to 96%. SHAP values revealed that waterbody, traffic and agriculture emissions were the most significant factors to affect NH3 concentrations in Taichung among all the characteristics. Our methodology is a vital first step for shaping future policies and regulations and is adaptable to regions with limited monitoring sites.

Asunto(s)

Contaminantes Atmosféricos , Contaminación del Aire , Contaminantes Atmosféricos/análisis , Inteligencia Artificial , Monitoreo del Ambiente/métodos , Contaminación del Aire/análisis , Material Particulado/análisis

11.

Machine learning modeling and additive explanation techniques for glutathione production from multiple experimental growth conditions of Saccharomyces cerevisiae.

Fuhr, Ana Carolina Ferreira Piazzi; Gonçalves, Ingrid da Mata; Santos, Lucielen Oliveira; Salau, Nina Paula Gonçalves.

Int J Biol Macromol ; 262(Pt 2): 130035, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38336325

RESUMEN

Glutathione (GSH) production is of great industrial interest due to its essential properties. This study aimed to use machine learning (ML) methods to model GSHproduction under different growth conditions of Saccharomyces cerevisiae, namely cultivation time, culture volume, pressure, and magnetic field application. Different ML and regression models were evaluated for their statistics to select the most robust model. Results showed that eXtreme Gradient Boosting (XGB) was the best predictive performance model. From the best model, additive explanation techniques were used to identify the feature importance of process. According to variable analysis, the best conditions to obtain the highest GSH concentrations would be cultivation times of 72-96 h, low magnetic field intensity (3.02 mT), low pressure (0.5 kgf.cm-2), and high culture volume (3.5 L). XGB use and additive explanation techniques proved promising for determining process optimization conditions and selecting the essential process variables.

Asunto(s)

Glutatión , Saccharomyces cerevisiae , Industrias , Luz , Aprendizaje Automático

12.

Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types.

Yoo, Joon Woo; Park, Junsung; Park, Heejun.

Int J Inj Contr Saf Promot ; 31(2): 203-215, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38164519

RESUMEN

Construction workers face a high risk of various occupational accidents, many of which can result in fatalities. This study aims to develop a prediction model for nine prevalent types of construction accidents, utilizing construction tasks, activities, and tools/materials as input features, through the application of machine learning-based multi-class classification algorithms. 152,867 construction accident summary reports, composed of both structured (construction task, construction activity, accident type) and unstructured data (tools/materials) were used for the study. The study employed several data processing techniques, including keyword extraction through text mining, Boruta feature selection, and SMOTE data resampling enhance model accuracy. Three performance metrics (Multi-class area under the receiver operating characteristic curve (MAUC), Multi-class Matthews Correlation Coefficient (MMCC), Geometric-mean (G-mean)) were used to compare the predictive performance of four machine learning algorithms, including Decision tree, Random forest, Naïve bayes, and XGBoost. Of the four algorithms, XGBoost showed the highest performance in predicting accident type (MAUC: 0.8603, MMCC: 0.3523, G-mean: 0.5009). Furthermore, a Shapley additive explanation (SHAP) analysis was conducted to visualize feature importance. The findings of this study make a valuable contribution to improving construction safety by presenting a prediction model for accident types derived from real-world big data.

Asunto(s)

Accidentes de Trabajo , Industria de la Construcción , Minería de Datos , Aprendizaje Automático , Minería de Datos/métodos , Humanos , República de Corea , Accidentes de Trabajo/prevención & control , Algoritmos , Teorema de Bayes

13.

Interpretable machine learning for predicting the fate and transport of pentachlorophenol in groundwater.

Rad, Mehran; Abtahi, Azra; Berndtsson, Ronny; McKnight, Ursula S; Aminifar, Amir.

Environ Pollut ; 345: 123449, 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38278404

RESUMEN

Pentachlorophenol (PCP) is a commonly found recalcitrant and toxic groundwater contaminant that resists degradation, bioaccumulates, and has a potential for long-range environmental transport. Taking proper actions to deal with the pollutant accounting for the life cycle consequences requires a better understanding of its behavior in the subsurface. We recognize the huge potential for enhancing decision-making at contaminated groundwater sites with the arrival of machine learning (ML) techniques in environmental applications. We used ML to enhance the understanding of the dynamics of PCP transport properties in the subsurface, and to determine key hydrochemical and hydrogeological drivers affecting its transport and fate. We demonstrate how this complementary knowledge, provided by data-driven methods, may enable a more targeted planning of monitoring and remediation at two highly contaminated Swedish groundwater sites, where the method was validated. We evaluated 6 interpretable ML methods, 3 linear regressors and 3 non-linear (i.e., tree-based) regressors, to predict PCP concentration in the groundwater. The modeling results indicate that simple linear ML models were found to be useful in the prediction of observations for datasets without any missing values, while tree-based regressors were more suitable for datasets containing missing values. Considering that missing values are common in datasets collected during contaminated site investigations, this could be of significant importance for contaminated site planners and managers, ultimately reducing site investigation and monitoring costs. Furthermore, we interpreted the proposed models using the SHAP (SHapley Additive exPlanations) approach to decipher the importance of different drivers in the prediction and simulation of critical hydrogeochemical variables. Among these, sum of chlorophenols is of highest significance in the analyses. Setting that aside from the model, tetra chlorophenols, dissolved organic carbon, and conductivity found to be of highest importance. Accordingly, ML methods could potentially be used to improve the understanding of groundwater contamination transport dynamics, filling gaps in knowledge that remain when using more sophisticated deterministic modeling approaches.

Asunto(s)

Clorofenoles , Agua Subterránea , Pentaclorofenol , Agua Subterránea/química , Contaminación Ambiental

14.

Predicting coastal harmful algal blooms using integrated data-driven analysis of environmental factors.

Yan, Zhengxiao; Kamanmalek, Sara; Alamdari, Nasrin.

Sci Total Environ ; 912: 169253, 2024 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-38101630

RESUMEN

Coastal harmful algal blooms (HABs) have become one of the challenging environmental problems in the world's thriving coastal cities due to the interference of multiple stressors from human activities and climate change. Past HAB predictions primarily relied on single-source data, overlooked upstream land use, and typically used a single prediction algorithm. To address these limitations, this study aims to develop predictive models to establish the relationship between the HAB indicator - chlorophyll-a (Chl-a) and various environmental stressors, under appropriate lagging predictive scenarios. To achieve this, we first applied the partial autocorrelation function (PACF) to Chl-a to precisely identify two prediction scenarios. We then combined multi-source data and several machine learning algorithms to predict harmful algae, using SHapley Additive exPlanations (SHAP) to extract key features influencing output from the prediction models. Our findings reveal an apparent 1-month autoregressive characteristic in Chl-a, leading us to create two scenarios: 1-month lead prediction and current-month prediction. The Extra Tree Regressor (ETR), with an R2 of 0.92, excelled in 1-month lead predictions, while the Random Forest Regressor (RFR) was most effective for current-month predictions with an R2 of 0.69. Additionally, we identified current month Chl-a, developed land use, total phosphorus, and nitrogen oxides (NOx) as critical features for accurate predictions. Our predictive framework, which can be applied to coastal regions worldwide, provides decision-makers with crucial tools for effectively predicting and mitigating HAB threats in major coastal cities.

Asunto(s)

Cambio Climático , Floraciones de Algas Nocivas , Humanos , Clorofila A , Ciudades , Fósforo

15.

Prediction of Parkinson's Disease Using Machine Learning Methods.

Zhang, Jiayu; Zhou, Wenchao; Yu, Hongmei; Wang, Tong; Wang, Xiaqiong; Liu, Long; Wen, Yalu.

Biomolecules ; 13(12)2023 12 08.

Artículo en Inglés | MEDLINE | ID: mdl-38136632

RESUMEN

The detection of Parkinson's disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we first grouped PD-associated factors based on their cost and accessibility, and then gradually incorporated them into risk predictions, which were built using eight commonly used machine learning models to allow for comprehensive assessment. Finally, the Shapley Additive Explanations (SHAP) method was used to investigate the contributions of each factor. We found that models built with demographic variables, hospital admission examinations, clinical assessment, and polygenic risk score achieved the best prediction performance, and the inclusion of invasive biomarkers could not further enhance its accuracy. Among the eight machine learning models considered, penalized logistic regression and XGBoost were the most accurate algorithms for assessing PD risk, with penalized logistic regression achieving an area under the curve of 0.94 and a Brier score of 0.08. Olfactory function and polygenic risk scores were the most important predictors for PD risk. Our research has offered a practical framework for PD risk assessment, where necessary information and efficient machine learning tools were highlighted.

Asunto(s)

Enfermedad de Parkinson , Humanos , Enfermedad de Parkinson/diagnóstico , Enfermedad de Parkinson/genética , Algoritmos , Puntuación de Riesgo Genético , Hospitalización , Aprendizaje Automático

16.

Interpretable prediction of 3-year all-cause mortality in patients with chronic heart failure based on machine learning.

Xu, Chenggong; Li, Hongxia; Yang, Jianping; Peng, Yunzhu; Cai, Hongyan; Zhou, Jing; Gu, Wenyi; Chen, Lixing.

BMC Med Inform Decis Mak ; 23(1): 267, 2023 11 20.

Artículo en Inglés | MEDLINE | ID: mdl-37985996

RESUMEN

BACKGROUND: The goal of this study was to assess the effectiveness of machine learning models and create an interpretable machine learning model that adequately explained 3-year all-cause mortality in patients with chronic heart failure. METHODS: The data in this paper were selected from patients with chronic heart failure who were hospitalized at the First Affiliated Hospital of Kunming Medical University, from 2017 to 2019 with cardiac function class III-IV. The dataset was explored using six different machine learning models, including logistic regression, naive Bayes, random forest classifier, extreme gradient boost, K-nearest neighbor, and decision tree. Finally, interpretable methods based on machine learning, such as SHAP value, permutation importance, and partial dependence plots, were used to estimate the 3-year all-cause mortality risk and produce individual interpretations of the model's conclusions. RESULT: In this paper, random forest was identified as the optimal aools lgorithm for this dataset. We also incorporated relevant machine learning interpretable tand techniques to improve disease prognosis, including permutation importance, PDP plots and SHAP values for analysis. From this study, we can see that the number of hospitalizations, age, glomerular filtration rate, BNP, NYHA cardiac function classification, lymphocyte absolute value, serum albumin, hemoglobin, total cholesterol, pulmonary artery systolic pressure and so on were important for providing an optimal risk assessment and were important predictive factors of chronic heart failure. CONCLUSION: The machine learning-based cardiovascular risk models could be used to accurately assess and stratify the 3-year risk of all-cause mortality among CHF patients. Machine learning in combination with permutation importance, PDP plots, and the SHAP value could offer a clear explanation of individual risk prediction and give doctors an intuitive knowledge of the functions of important model components.

Asunto(s)

Insuficiencia Cardíaca , Humanos , Teorema de Bayes , Enfermedad Crónica , Análisis por Conglomerados , Aprendizaje Automático

17.

Methodologic Issues Specific to Prediction Model Development and Evaluation.

Jin, Yuxuan; Kattan, Michael W.

Chest ; 164(5): 1281-1289, 2023 11.

Artículo en Inglés | MEDLINE | ID: mdl-37414333

RESUMEN

Developing and evaluating statistical prediction models is challenging, and many pitfalls can arise. This article identifies what the authors believe are some common methodologic concerns that may be encountered. We describe each problem and make suggestions regarding how to address them. The hope is that this article will result in higher-quality publications of statistical prediction models.

Asunto(s)

Modelos Estadísticos , Humanos , Curva ROC

18.

Dysbiosis signatures of gut microbiota and the progression of type 2 diabetes: a machine learning approach in a Mexican cohort.

Neri-Rosario, Daniel; Martínez-López, Yoscelina Estrella; Esquivel-Hernández, Diego A; Sánchez-Castañeda, Jean Paul; Padron-Manrique, Cristian; Vázquez-Jiménez, Aarón; Giron-Villalobos, David; Resendis-Antonio, Osbaldo.

Front Endocrinol (Lausanne) ; 14: 1170459, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37441494

RESUMEN

Introduction: The gut microbiota (GM) dysbiosis is one of the causal factors for the progression of different chronic metabolic diseases, including type 2 diabetes mellitus (T2D). Understanding the basis that laid this association may lead to developing new therapeutic strategies for preventing and treating T2D, such as probiotics, prebiotics, and fecal microbiota transplants. It may also help identify potential early detection biomarkers and develop personalized interventions based on an individual's gut microbiota profile. Here, we explore how supervised Machine Learning (ML) methods help to distinguish taxa for individuals with prediabetes (prediabetes) or T2D. Methods: To this aim, we analyzed the GM profile (16s rRNA gene sequencing) in a cohort of 410 Mexican naïve patients stratified into normoglycemic, prediabetes, and T2D individuals. Then, we compared six different ML algorithms and found that Random Forest had the highest predictive performance in classifying T2D and prediabetes patients versus controls. Results: We identified a set of taxa for predicting patients with T2D compared to normoglycemic individuals, including Allisonella, Slackia, Ruminococus_2, Megaspgaera, Escherichia/Shigella, and Prevotella, among them. Besides, we concluded that Anaerostipes, Intestinibacter, Prevotella_9, Blautia, Granulicatella, and Veillonella were the relevant genus in patients with prediabetes compared to normoglycemic subjects. Discussion: These findings allow us to postulate that GM is a distinctive signature in prediabetes and T2D patients during the development and progression of the disease. Our study highlights the role of GM and opens a window toward the rational design of new preventive and personalized strategies against the control of this disease.

Asunto(s)

Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Estado Prediabético , Humanos , Diabetes Mellitus Tipo 2/diagnóstico , Estado Prediabético/diagnóstico , Disbiosis , ARN Ribosómico 16S/genética , Aprendizaje Automático

19.

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification.

Sun, Jeffrey; Sun, Cheuk-Kay; Tang, Yun-Xuan; Liu, Tzu-Chi; Lu, Chi-Jie.

Healthcare (Basel) ; 11(14)2023 Jul 11.

Artículo en Inglés | MEDLINE | ID: mdl-37510441

RESUMEN

Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.

20.

Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning.

Lu, Xiaoya; Qiu, Hang.

BMC Med Inform Decis Mak ; 23(1): 59, 2023 04 06.

Artículo en Inglés | MEDLINE | ID: mdl-37024922

RESUMEN

BACKGROUND: With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources. METHODS: In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model. RESULTS: Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R2 improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD. CONCLUSIONS: Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.

Asunto(s)

Trastornos Cerebrovasculares , Redes Neurales de la Computación , Humanos , China/epidemiología , Aprendizaje Automático , Hospitalización , Trastornos Cerebrovasculares/epidemiología

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA