Search | VHL Regional Portal

1.

Assessing the tumor immune landscape across multiple spatial scales to differentiate immunotherapy response in metastatic non-small cell lung cancer.

Tsang, Ashley P; Krishnan, Santhoshi N; Eliason, Joel N; McGue, Jake J; Qin, Angel; Frankel, Timothy L; Rao, Arvind.

Lab Invest ; : 102148, 2024 Oct 08.

Article in English | MEDLINE | ID: mdl-39389312

ABSTRACT

While immune checkpoint inhibitor-based (ICI) therapy has shown promising results in non-small cell lung cancer (NSCLC) patients with high programmed death ligand 1 (PD-L1) expression, not all patients respond to therapy. The tumor microenvironment (TME) is complex and heterogeneous, making it challenging to understand the key agents and features which influence response to therapies. In this study, we leverage multiplex fluorescent immunohistochemistry (mfIHC) to quantitatively assess interactions between tumor and immune cells in an effort to identify patterns occurring at multiple spatial levels of the TME. To do so, we introduce several computational methods novel to a dataset of 1,269 mfIHC images from a cohort of 52 patients with metastatic NSCLC. With the spatial G-cross function, we quantify the degree of cell interaction at an entire image level, where we see significantly increased activity of cytotoxic T-cells (CTLs) and helper T-cells (HTLs) with epithelial tumor cells (ECs) in responders to ICI (p = .022 and p < .001, respectively), and decreased activity of T-regulatory cells (Tregs) with ECs compared to non-responders (p = .010). By leveraging spatial overlap methods, we define tumor subregions (which we call the tumor "periphery", "edge" and "center") and discover more localized immune-immune interactions influencing positive response, including those between CTLs and HTLs with antigen presenting cells (APCs) in these subregions specifically. Lastly, we trained an interpretable deep learning model which identified key cellular regions of interest that most influenced response classification (AUC = 0.71±0.02). Assessing spatial interactions within these subregions further revealed new insights not significant at the whole image level, particularly the elevated association of APCs and Tregs with one another in responder groups (p = 0.024). Altogether, we demonstrate that elucidating patterns of cell composition and interplay across multiple levels of spatial analyses can improve our understanding of the TME and better differentiate patient responses to immunotherapy.

2.

Multi-Objective Design of DNA-Stabilized Nanoclusters Using Variational Autoencoders With Automatic Feature Extraction.

Sadeghi, Elham; Mastracco, Peter; Gonzàlez-Rosell, Anna; Copp, Stacy M; Bogdanov, Petko.

ACS Nano ; 18(39): 26997-27008, 2024 Oct 01.

Article in English | MEDLINE | ID: mdl-39288200

ABSTRACT

DNA-stabilized silver nanoclusters (AgN-DNAs) have sequence-tuned compositions and fluorescence colors. High-throughput experiments together with supervised machine learning models have recently enabled design of DNA templates that select for AgN-DNA properties, including near-infrared (NIR) emission that holds promise for deep tissue bioimaging. However, these existing models do not enable simultaneous selection of multiple AgN-DNA properties, and require significant expert input for feature engineering and class definitions. This work presents a model for multiobjective, continuous-property design of AgN-DNAs with automatic feature extraction, based on variational autoencoders (VAEs). This model is generative, i.e., it learns both the forward mapping from DNA sequence to AgN-DNA properties and the inverse mapping from properties to sequence, and is trained on an experimental data set of DNA sequences paired with AgN-DNA fluorescence properties. Experimental testing shows that the model enables effective design of AgN-DNA emission, including bright NIR AgN-DNAs with 4-fold greater abundance compared to training data. In addition, Shapley analysis is employed to discern learned nucleobase patterns that correspond to fluorescence color and brightness. This generative model can be adapted for a range of biomolecular systems with sequence-dependent properties, enabling precise design of emerging biomolecular nanomaterials.

Subject(s)

DNA , Metal Nanoparticles , Silver , DNA/chemistry , Silver/chemistry , Metal Nanoparticles/chemistry , Nanostructures/chemistry

3.

Elucidating per- and polyfluoroalkyl substances (PFASs) soil-water partitioning behavior through explainable machine learning models.

Xie, Jiaxing; Liu, Shun; Su, Lihao; Zhao, Xinting; Wang, Yan; Tan, Feng.

Sci Total Environ ; 954: 176575, 2024 Sep 27.

Article in English | MEDLINE | ID: mdl-39343411

ABSTRACT

In this study, an optimized random forest (RF) model was employed to better understand the soil-water partitioning behavior of per- and polyfluoroalkyl substances (PFASs). The model demonstrated strong predictive performance, achieving an R2 of 0.93 and an RMSE of 0.86. Moreover, it required only 11 easily obtainable features, with molecular weight and soil pH being the predominant factors. Using three-dimensional interaction analyses identified specific conditions associated with varying soil-water partitioning coefficients (Kd). Results showed that soils with high organic carbon (OC) content, cation exchange capacity (CEC), and lower soil pH, especially when combined with PFASs of higher molecular weight, were linked to higher Kd values, indicating stronger adsorption. Conversely, low Kd values (< 2.8 L/kg) typically observed in soils with higher pH (8.0), but lower CEC (8 cmol+/kg), lesser OC content (1 %), and lighter molecular weight (380 g/mol), suggested weaker adsorption capacities and a heightened potential for environmental migration. Furthermore, the model was used to predict Kd values for 142 novel PFASs in diverse soil conditions. Our research provides essential insights into the factors governing PFASs partitioning in soil and highlights the significant role of machine learning models in enhancing the understanding of environmental distribution and migration of PFASs.

4.

Enhancing intrusion detection performance using explainable ensemble deep learning.

Ben Ncir, Chiheb Eddine; Ben HajKacem, Mohamed Aymen; Alattas, Mohammed.

PeerJ Comput Sci ; 10: e2289, 2024.

Article in English | MEDLINE | ID: mdl-39314740

ABSTRACT

Given the exponential growth of available data in large networks, the need for an accurate and explainable intrusion detection system has become of high necessity to effectively discover attacks in such networks. To deal with this challenge, we propose a two-phase Explainable Ensemble deep learning-based method (EED) for intrusion detection. In the first phase, a new ensemble intrusion detection model using three one-dimensional long short-term memory networks (LSTM) is designed for an accurate attack identification. The outputs of three classifiers are aggregated using a meta-learner algorithm resulting in refined and improved results. In the second phase, interpretability and explainability of EED outputs are enhanced by leveraging the capabilities of SHape Additive exPplanations (SHAP). Factors contributing to the identification and classification of attacks are highlighted which allows security experts to understand and interpret the attack behavior and then implement effective response strategies to improve the network security. Experiments conducted on real datasets have shown the effectiveness of EED compared to conventional intrusion detection methods in terms of both accuracy and explainability. The EED method exhibits high accuracy in accurately identifying and classifying attacks while providing transparency and interpretability.

5.

Advanced High-Throughput Rational Design of Porphyrin-Sensitized Solar Cells Using Interpretable Machine Learning.

Liao, Jian-Ming; Chen, Yu-Hsuan; Lee, Hsuan-Wei; Guo, Bo-Cheng; Su, Po-Cheng; Wang, Lun-Hong; Reddy, Nagannagari Masi; Yella, Aswani; Zhang, Zhao-Jie; Chang, Chuan-Yung; Chen, Chia-Yuan; Zakeeruddin, Shaik M; Tsai, Hui-Hsu Gavin; Yeh, Chen-Yu; Grätzel, Michael.

Adv Sci (Weinh) ; : e2407235, 2024 Sep 24.

Article in English | MEDLINE | ID: mdl-39316380

ABSTRACT

Accurately predicting the power conversion efficiency (PCE) in dye-sensitized solar cells (DSSCs) represents a crucial challenge, one that is pivotal for the high throughput rational design and screening of promising dye sensitizers. This study presents precise, predictive, and interpretable machine learning (ML) models specifically designed for Zn-porphyrin-sensitized solar cells. The model leverages theoretically computable, effective, and reusable molecular descriptors (MDs) to address this challenge. The models achieve excellent performance on a "blind test" of 17 newly designed cells, with a mean absolute error (MAE) of 1.02%. Notably, 10 dyes are predicted within a 1% error margin. These results validate the ML models and their importance in exploring uncharted chemical spaces of Zn-porphyrins. SHAP analysis identifies crucial MDs that align well with experimental observations, providing valuable chemical guidelines for the rational design of dyes in DSSCs. These predictive ML models enable efficient in silico screening, significantly reducing analysis time for photovoltaic cells. Promising Zn-porphyrin-based dyes with exceptional PCE are identified, facilitating high-throughput virtual screening. The prediction tool is publicly accessible at https://ai-meta.chem.ncu.edu.tw/dsc-meta.

6.

Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.

Duan, Yifan; Wang, Ruiqi; Huang, Zhilin; Chen, Haoran; Tang, Mingkun; Zhou, Jiayin; Hu, Zhengyong; Hu, Wanfei; Chen, Zhenli; Qian, Qing; Wang, Haolin.

Hellenic J Cardiol ; 2024 Aug 10.

Article in English | MEDLINE | ID: mdl-39128707

ABSTRACT

OBJECTIVE: This study aimed to leverage real-world electronic medical record data to develop interpretable machine learning models for diagnosis of Kawasaki disease while also exploring and prioritizing the significant risk factors. METHODS: A comprehensive study was conducted on 4087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using IBM SPSS Statistics, Version 26.0. The optimal feature subset was used to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine, Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums, Decision Tree, AdaBoost Classifier, and Logistic Regression. Model performance was evaluated in three dimensions: discriminative ability via receiver operating characteristic curves, calibration accuracy using calibration curves, and interpretability through SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations). RESULTS: In this study, Kawasaki disease was diagnosed in 2971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance relative to other models, with an area under the curve of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools such as SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease. CONCLUSION: This study used diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models such as EBM outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.

7.

Decoding pulsatile patterns of cerebrospinal fluid dynamics through enhancing interpretability in machine learning.

Keles, Ayse; Ozisik, Pinar Akdemir; Algin, Oktay; Celebi, Fatih Vehbi; Bendechache, Malika.

Sci Rep ; 14(1): 17854, 2024 08 01.

Article in English | MEDLINE | ID: mdl-39090141

ABSTRACT

Analyses of complex behaviors of Cerebrospinal Fluid (CSF) have become increasingly important in diseases diagnosis. The changes of the phase-contrast magnetic resonance imaging (PC-MRI) signal formed by the velocity of flowing CSF are represented as a set of velocity-encoded images or maps, which can be thought of as signal data in the context of medical imaging, enabling the evaluation of pulsatile patterns throughout a cardiac cycle. However, automatic segmentation of the CSF region in a PC-MRI image is challenging, and implementing an explained ML method using pulsatile data as a feature remains unexplored. This paper presents lightweight machine learning (ML) algorithms to perform CSF lumen segmentation in spinal, utilizing sets of velocity-encoded images or maps as a feature. The Dataset contains 57 PC-MRI slabs by 3T MRI scanner from control and idiopathic scoliosis participants are involved to collect data. The ML models are trained with 2176 time series images. Different cardiac periods image (frame) numbers of PC-MRIs are interpolated in the preprocessing step to align to features of equal size. The fivefold cross-validation procedure is used to estimate the success of the ML models. Additionally, the study focusses on enhancing the interpretability of the highest-accuracy eXtreme gradient boosting (XGB) model by applying the shapley additive explanations (SHAP) technique. The XGB algorithm presented its highest accuracy, with an average fivefold accuracy of 0.99% precision, 0.95% recall, and 0.97% F1 score. We evaluated the significance of each pulsatile feature's contribution to predictions, offering a more profound understanding of the model's behavior in distinguishing CSF lumen pixels with SHAP. Introducing a novel approach in the field, develop ML models offer comprehension into feature extraction and selection from PC-MRI pulsatile data. Moreover, the explained ML model offers novel and valuable insights to domain experts, contributing to an enhanced scholarly understanding of CSF dynamics.

Subject(s)

Cerebrospinal Fluid , Machine Learning , Magnetic Resonance Imaging , Pulsatile Flow , Humans , Magnetic Resonance Imaging/methods , Algorithms , Scoliosis/diagnostic imaging , Image Processing, Computer-Assisted/methods , Female , Male

8.

Identifying the most crucial factors associated with depression based on interpretable machine learning: a case study from CHARLS.

Li, Rulin; Wang, Xueyan; Luo, Lanjun; Yuan, Youwei.

Front Psychol ; 15: 1392240, 2024.

Article in English | MEDLINE | ID: mdl-39118849

ABSTRACT

Background: Depression is one of the most common mental illnesses among middle-aged and older adults in China. It is of great importance to find the crucial factors that lead to depression and to effectively control and reduce the risk of depression. Currently, there are limited methods available to accurately predict the risk of depression and identify the crucial factors that influence it. Methods: We collected data from 25,586 samples from the harmonized China Health and Retirement Longitudinal Study (CHARLS), and the latest records from 2018 were included in the current cross-sectional analysis. Ninety-three input variables in the survey were considered as potential influential features. Five machine learning (ML) models were utilized, including CatBoost and eXtreme Gradient Boosting (XGBoost), Gradient Boosting decision tree (GBDT), Random Forest (RF), Light Gradient Boosting Machine (LightGBM). The models were compared to the traditional multivariable Linear Regression (LR) model. Simultaneously, SHapley Additive exPlanations (SHAP) were used to identify key influencing factors at the global level and explain individual heterogeneity through instance-level analysis. To explore how different factors are non-linearly associated with the risk of depression, we employed the Accumulated Local Effects (ALE) approach to analyze the identified critical variables while controlling other covariates. Results: CatBoost outperformed other machine learning models in terms of MAE, MSE, MedAE, and R2metrics. The top three crucial factors identified by the SHAP were r4satlife, r4slfmem, and r4shlta, representing life satisfaction, self-reported memory, and health status levels, respectively. Conclusion: This study demonstrates that the CatBoost model is an appropriate choice for predicting depression among middle-aged and older adults in Harmonized CHARLS. The SHAP and ALE interpretable methods have identified crucial factors and the nonlinear relationship with depression, which require the attention of domain experts.

9.

Predicting physical functioning status in older adults: insights from wrist accelerometer sensors and derived digital biomarkers of physical activity.

Fan, Lingjie; Zhao, Junhan; Hu, Yao; Zhang, Junjie; Wang, Xiyue; Wang, Fengyi; Wu, Mengyi; Lin, Tao.

J Am Med Inform Assoc ; 2024 Aug 23.

Article in English | MEDLINE | ID: mdl-39178361

ABSTRACT

OBJECTIVE: Conventional physical activity (PA) metrics derived from wearable sensors may not capture the cumulative, transitions from sedentary to active, and multidimensional patterns of PA, limiting the ability to predict physical function impairment (PFI) in older adults. This study aims to identify unique temporal patterns and develop novel digital biomarkers from wrist accelerometer data for predicting PFI and its subtypes using explainable artificial intelligence techniques. MATERIALS AND METHODS: Wrist accelerometer streaming data from 747 participants in the National Health and Aging Trends Study (NHATS) were used to calculate 231 PA features through time-series analysis techniques-Tsfresh. Predictive models for PFI and its subtypes (walking, balance, and extremity strength) were developed using 6 machine learning (ML) algorithms with hyperparameter optimization. The SHapley Additive exPlanations method was employed to interpret the ML models and rank the importance of input features. RESULTS: Temporal analysis revealed peak PA differences between PFI and healthy controls from 9:00 to 11:00 am. The best-performing model (Gradient boosting Tree) achieved an area under the curve score of 85.93%, accuracy of 81.52%, sensitivity of 77.03%, and specificity of 87.50% when combining wrist accelerometer streaming data (WAPAS) features with demographic data. DISCUSSION: The novel digital biomarkers, including change quantiles, Fourier transform (FFT) coefficients, and Aggregated (AGG) Linear Trend, outperformed traditional PA metrics in predicting PFI. These findings highlight the importance of capturing the multidimensional nature of PA patterns for PFI. CONCLUSION: This study investigates the potential of wrist accelerometer digital biomarkers in predicting PFI and its subtypes in older adults. Integrated PFI monitoring systems with digital biomarkers would improve the current state of remote PFI surveillance.

10.

A novel coupling interpretable machine learning framework for water quality prediction and environmental effect understanding in different flow discharge regulations of hydro-projects.

Nong, Xizhi; Lai, Cheng; Chen, Lihua; Wei, Jiahua.

Sci Total Environ ; 950: 175281, 2024 Nov 10.

Article in English | MEDLINE | ID: mdl-39117235

ABSTRACT

Machine learning models (MLMs) have been increasingly used to forecast water pollution. However, the "black box" characteristic for understanding mechanism processes still limits the applicability of MLMs for water quality management in hydro-projects under complex and frequently artificial regulation. This study proposes an interpretable machine learning framework for water quality prediction coupled with a hydrodynamic (flow discharge) scenario-based Random Forest (RF) model with multiple model-agnostic techniques and quantifies global, local, and joint interpretations (i.e., partial dependence, individual conditional expectation, and accumulated local effects) of environmental factor implications. The framework was applied and verified to predict the permanganate index (CODMn) under different flow discharge regulation scenarios in the Middle Route of the South-to-North Water Diversion Project of China (MRSNWDPC). A total of 4664 sampling cases data matrices, including water quality, meteorological, and hydrological indicators from eight national stations along the main canal of the MRSNWDPC, were collected from May 2019 to December 2020. The results showed that the RF models were effective in forecasting CODMn in all flow discharge scenarios, with a mean square error, coefficient of determination, and mean absolute error of 0.006-0.026, 0.481-0.792, and 0.069-0.104, respectively, in the testing dataset. A global interpretation indicated that dissolved oxygen, flow discharge, and surface pressure are the three most important variables of CODMn. Local and joint interpretations indicated that the RF-based prediction model provides a basic understanding of the physical mechanisms of environmental systems. The proposed framework can effectively learn the fundamental environmental implications of water quality variations and provide reliable prediction performance, highlighting the importance of model interpretability for trustworthy machine learning applications in water management projects. This study provides scientific references for applying advanced data-driven MLMs to water quality forecasting and a reliable methodological framework for water quality management and similar hydro-projects.

11.

Sludge bound-EPS solubilization enhance CH₄ bioconversion and membrane fouling mitigation in electrochemical anaerobic membrane bioreactor: Insights from continuous operation and interpretable machine learning algorithms.

Niu, Chengxin; Zhang, Zhongyi; Cai, Teng; Pan, Yang; Lu, Xueqin; Zhen, Guangyin.

Water Res ; 264: 122243, 2024 Oct 15.

Article in English | MEDLINE | ID: mdl-39142046

ABSTRACT

Bound extracellular polymeric substances (EPS) are complex, high-molecular-weight polymer mixtures that play a critical role in pore clogging, foulants adhesion, and fouling layer formation during membrane filtration, owing to their adhesive properties and gelation tendencies. In this study, a novel electrochemical anaerobic membrane bioreactor (EC-AnMBR) was constructed to investigate the effect of sludge bound-EPS solubilization on methane bioconversion and membrane fouling mitigation. During the 150-days' operation, the EC-AnMBR demonstrated remarkable performance, characterized by an exceptionally low fouling rate (transmembrane pressure (TMP) < 4.0 kPa) and high-quality effluent (COD removal > 98.2â¯%, protein removal > 97.7â¯%, and polysaccharide removal > 98.5â¯%). The highest methane productivity was up to 38.0 ± 3.1 mL/Lreactor/d at the applied voltage of 0.8 V with bound-EPS solubilization, 107.6â¯% higher than that of the control stage (18.3 ± 2.4 mL/Lreactor/d). Morphological and multiplex fluorescence labeling analyses revealed higher fluorescence intensities of proteins, polysaccharides, total cells and lipids on the surface of the fouling layer. In contrast, the interior exhibited increased compression density and reduced activity, likely attributable to compression effect. Under the synergistic influence of the electric field and bound-EPS solubilization, biomass characteristics exhibited a reduced propensity for membrane fouling. Furthermore, the bio-electrochemical regulation enhanced the electroactivity of microbial aggregates and enriched functional microorganisms, thereby promoting biofilm growth and direct interspecies electron transfer. Additionally, the potential hydrogenotrophic and methylotrophic methanogenesis pathways were enhanced at the cathode and anode surfaces, thereby increasing CH4 productivity. The random forest-based machine learning model analyzed the nonlinear contributions of EPS characteristics on methane productivity and TMP values, achieving R² values of 0.879 and 0.848, respectively. Shapley additive explanations (SHAP) analysis indicated that S-EPSPS and S-EPSPN were the most critical factors affecting CH4 productivity and membrane fouling, respectively. Partial dependence plot analysis further verified the marginal and interaction effects of different EPS layers on these outcomes. By combining continuous operation with interpretable machine learning algorithms, this study unveils the intricate impacts of EPS characteristics on methane productivity and membrane fouling behaviors, and provides new insights into sludge bound-EPS solubilization in EC-AnMBR.

Subject(s)

Bioreactors , Machine Learning , Membranes, Artificial , Methane , Sewage , Sewage/microbiology , Anaerobiosis , Biofouling , Extracellular Polymeric Substance Matrix , Solubility , Waste Disposal, Fluid/methods

12.

A Comparison of Interpretable Machine Learning Approaches to Identify Outpatient Clinical Phenotypes Predictive of First Acute Myocardial Infarction.

Hodgman, Matthew; Minoccheri, Cristian; Mathis, Michael; Wittrup, Emily; Najarian, Kayvan.

Diagnostics (Basel) ; 14(16)2024 Aug 10.

Article in English | MEDLINE | ID: mdl-39202229

ABSTRACT

BACKGROUND: Acute myocardial infarctions are deadly to patients and burdensome to healthcare systems. Most recorded infarctions are patients' first, occur out of the hospital, and often are not accompanied by cardiac comorbidities. The clinical manifestations of the underlying pathophysiology leading to an infarction are not fully understood and little effort exists to use explainable machine learning to learn predictive clinical phenotypes before hospitalization is needed. METHODS: We extracted outpatient electronic health record data for 2641 case and 5287 matched-control patients, all without pre-existing cardiac diagnoses, from the Michigan Medicine Health System. We compare six different interpretable, feature extraction approaches, including temporal computational phenotyping, and train seven interpretable machine learning models to predict the onset of first acute myocardial infarction within six months. RESULTS: Using temporal computational phenotypes significantly improved the model performance compared to alternative approaches. The mean cross-validation test set performance exhibited area under the receiver operating characteristic curve values as high as 0.674. The most consistently predictive phenotypes of a future infarction include back pain, cardiometabolic syndrome, family history of cardiovascular diseases, and high blood pressure. CONCLUSIONS: Computational phenotyping of longitudinal health records can improve classifier performance and identify predictive clinical concepts. State-of-the-art interpretable machine learning approaches can augment acute myocardial infarction risk assessment and prioritize potential risk factors for further investigation and validation.

13.

Integrating multi-omics data of childhood asthma using a deep association model.

Wei, Kai; Qian, Fang; Li, Yixue; Zeng, Tao; Huang, Tao.

Fundam Res ; 4(4): 738-751, 2024 Jul.

Article in English | MEDLINE | ID: mdl-39156565

ABSTRACT

Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity. The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic models of childhood asthma. To capture the nonlinear association of multi-omics data and improve interpretability of diagnostic model, we proposed a novel deep association model (DAM) and corresponding efficient analysis framework. First, the Deep Subspace Reconstruction was used to fuse the omics data and diagnostic information, thereby correcting the distribution of the original omics data and reducing the influence of unnecessary data noises. Second, the Joint Deep Semi-Negative Matrix Factorization was applied to identify different latent sample patterns and extract biomarkers from different omics data levels. Third, our newly proposed Deep Orthogonal Canonical Correlation Analysis can rank features in the collaborative module, which are able to construct the diagnostic model considering nonlinear correlation between different omics data levels. Using DAM, we deeply analyzed the transcriptome and methylation data of childhood asthma. The effectiveness of DAM is verified from the perspectives of algorithm performance and biological significance on the independent test dataset, by ablation experiment and comparison with many baseline methods from clinical and biological studies. The DAM-induced diagnostic model can achieve a prediction AUC of 0.912, which is higher than that of many other alternative methods. Meanwhile, relevant pathways and biomarkers of childhood asthma are also recognized to be collectively altered on the gene expression and methylation levels. As an interpretable machine learning approach, DAM simultaneously considers the non-linear associations among samples and those among biological features, which should help explore interpretative biomarker candidates and efficient diagnostic models from multi-omics data analysis for human complex diseases.

14.

Predicting rice phenology across China by integrating crop phenology model and machine learning.

Zhang, Jinhan; Lin, Xiaomao; Jiang, Chongya; Hu, Xuntao; Liu, Bing; Liu, Leilei; Xiao, Liujun; Zhu, Yan; Cao, Weixing; Tang, Liang.

Sci Total Environ ; 951: 175585, 2024 Nov 15.

Article in English | MEDLINE | ID: mdl-39155002

ABSTRACT

This study explores the integration of crop phenology models and machine learning approaches for predicting rice phenology across China, to gain a deeper understanding of rice phenology prediction. Multiple approaches were used to predict heading and maturity dates at 337 locations across the main rice growing regions of China from 1981 to 2020, including crop phenology model, machine learning and hybrid model that integrate both approaches. Furthermore, an interpretable machine learning (IML) using SHapley Additive exPlanation (SHAP) was employed to elucidate influence of climatic and varietal factors on uncertainty in crop phenology model predictions. Overall, the hybrid model demonstrated a high accuracy in predicting rice phenology, followed by machine learning and crop phenology models. The best hybrid model, based on a serial structure and the eXtreme Gradient Boosting (XGBoost) algorithm, achieved a root mean square error (RMSE) of 4.65 and 5.72 days and coefficient of determination (R2) values of 0.93 and 0.9 for heading and maturity predictions, respectively. SHAP analysis revealed temperature to be the most influential climate variable affecting phenology predictions, particularly under extreme temperature conditions, while rainfall and solar radiation were found to be less influential. The analysis also highlighted the variable importance of climate across different phenological stages, rice cultivation patterns, and geographic regions, underscoring the notable regionality. The study proposed that a hybrid model using an IML approach would not only improve the accuracy of prediction but also offer a robust framework for leveraging data-driven in crop modeling, providing a valuable tool for refining and advancing the modeling process in rice.

Subject(s)

Crops, Agricultural , Machine Learning , Oryza , China , Oryza/growth & development , Crops, Agricultural/growth & development , Climate , Seasons , Agriculture/methods

15.

Linking industrial emissions and dietary exposure to human burdens of polychlorinated naphthalenes.

Yang, Yujue; Li, Cui; Yang, Lili; Zhu, Hao; Xie, Zhiyong; Falandysz, Jerzy; Weber, Roland; Qin, Linjun; Liu, Guorui.

Sci Total Environ ; 951: 175733, 2024 Nov 15.

Article in English | MEDLINE | ID: mdl-39181249

ABSTRACT

Relationships between toxic pollutant emissions during industrial processes and toxic pollutant dietary intakes and adverse health burdens have not yet been quantitatively clarified. Polychlorinated naphthalenes (PCNs) are typical industrial pollutants that are carcinogenic and of increasing concern. In this study, we established an interpretable machine learning model for quantifying the contributions of industrial emissions and dietary intakes of PCNs to health effects. We used the SHapley Additive exPlanations model to achieve individualized interpretability, enabling us to evaluate the specific contributions of individual feature values towards PCNs concentration levels. A strong relationship between PCN dietary intake and body burden was found using a robust large-scale PCN diet survey database for China containing the results of the analyses of 17,280 dietary samples and 4480 breast milk samples. Industrial emissions and dietary intake contributed 12 % and 52 %, respectively, of the PCN burden in breast milk. The model quantified the contributions of food consumption and industrial emissions to PCN exposure, which will be useful for performing accurate health risk assessments and developing reduction strategies of PCNs.

Subject(s)

Dietary Exposure , Naphthalenes , Humans , Dietary Exposure/statistics & numerical data , Dietary Exposure/analysis , China , Naphthalenes/analysis , Milk, Human/chemistry , Environmental Exposure/statistics & numerical data , Environmental Pollutants/analysis , Industrial Waste/analysis , Risk Assessment

16.

Considering multi-scale built environment in modeling severity of traffic violations by elderly drivers: An interpretable machine learning framework.

Sun, Zhiyuan; Ai, Zhoumeng; Wang, Zehao; Wang, Jianyu; Gu, Xin; Wang, Duo; Lu, Huapu; Chen, Yanyan.

Accid Anal Prev ; 207: 107740, 2024 Nov.

Article in English | MEDLINE | ID: mdl-39142041

ABSTRACT

The causes of traffic violations by elderly drivers are different from those of other age groups. To reduce serious traffic violations that are more likely to cause serious traffic crashes, this study divided the severity of traffic violations into three levels (i.e., slight, ordinary, severe) based on point deduction, and explore the patterns of serious traffic violations (i.e., ordinary, severe) using multi-source data. This paper designed an interpretable machine learning framework, in which four popular machine learning models were enhanced and compared. Specifically, adaptive synthetic sampling method was applied to overcome the effects of imbalanced data and improve the prediction accuracy of minority classes (i.e., ordinary, severe); multi-objective feature selection based on NSGA-II was used to remove the redundant factors to increase the computational efficiency and make the patterns discovered by the explainer more effective; Bayesian hyperparameter optimization aimed to obtain more effective hyperparameters combination with fewer iterations and boost the model adaptability. Results show that the proposed interpretable machine learning framework can significantly improve and distinguish the performance of four popular machine learning models and two post-hoc interpretation methods. It is found that six of the top ten important factors belong to multi-scale built environment attributes. By comparing the results of feature contribution and interaction effects, some findings can be summarized: ordinary and severe traffic violations have some identical influencing factors and interactive effects; have the same influencing factors or the same combinations of influencing factors, but the values of the factors are different; have some unique influencing factors and unique combinations of influencing factors.

Subject(s)

Accidents, Traffic , Automobile Driving , Bayes Theorem , Built Environment , Machine Learning , Humans , Aged , Accidents, Traffic/prevention & control , Accidents, Traffic/statistics & numerical data , Automobile Driving/legislation & jurisprudence , Aged, 80 and over

17.

Interactive effects analysis of road, traffic, and weather characteristics on shared e-bike speeding risk: A data-driven approach.

Zhang, Xiaolong; Zhao, Xiaohua; Bian, Yang; Huang, Jianling; Yin, Luyao.

Accid Anal Prev ; 207: 107755, 2024 Nov.

Article in English | MEDLINE | ID: mdl-39214034

ABSTRACT

As electric bikes (e-bikes) rapidly develop in China, their traffic safety issues are becoming increasingly prominent. Accurately detecting risky riding behaviors and conducting mechanism analysis on the multiple risk factors are crucial in formulating and implementing precise management policies. The emergence of shared e-bikes and the advancements in interpretable machine learning present new opportunities for accurately analyzing the determinants of risky riding behaviors. The primary objective of this study is to examine and analyze the risk factors related to speeding behavior to aid urban management agencies in crafting necessary management policies. This study utilizes a large-scale dataset of shared e-bike trajectory data to establish a framework for detecting speeding behavior. Subsequently, the extreme gradient boosting (XGBoost) model is employed to identify the level of speeding risk by leveraging its excellent identification ability. Moreover, based on measuring the degree of interaction among road, traffic, and weather characteristics, the investigation of the complex interactive effects of these risk factors on high-risk speeding is conducted using bivariate partial dependence plots (PDP) by its superior parsing ability. Feature importance analysis results indicate that the top five ranked variables that significantly affect the identified results of speed risk levels are land use density, rainfall, road level, curbside parking density, and bike lane width. The interaction analysis results indicate that higher levels of road and bike lane width correspond to an increased possibility of high-risk speeding among riders. Land use density, curbside parking density, and rainfall display a nonlinear effect on high-risk speeding. Introducing road level, bike lane width, and time interval could change the patterns of nonlinear effects in land use density, curbside parking density, and rainfall. Finally, several policy recommendations are proposed to improve e-bike traffic safety by utilizing the extracted feature values associated with a higher probability of high-risk speeding.

Subject(s)

Accidents, Traffic , Bicycling , Weather , Humans , Accidents, Traffic/statistics & numerical data , Accidents, Traffic/prevention & control , China , Risk Factors , Bicycling/statistics & numerical data , Risk-Taking , Automobile Driving/statistics & numerical data , Machine Learning , Environment Design

18.

Neural effects of dopaminergic compounds revealed by multi-site electrophysiology and interpretable machine-learning.

Kapanaiah, Sampath K T; Rosenbrock, Holger; Hengerer, Bastian; Kätzel, Dennis.

Front Pharmacol ; 15: 1412725, 2024.

Article in English | MEDLINE | ID: mdl-39045050

ABSTRACT

Background: Neuropsychopharmacological compounds may exert complex brain-wide effects due to an anatomically and genetically broad expression of their molecular targets and indirect effects via interconnected brain circuits. Electrophysiological measurements in multiple brain regions using electroencephalography (EEG) or local field potential (LFP) depth-electrodes may record fingerprints of such pharmacologically-induced changes in local activity and interregional connectivity (pEEG/pLFP). However, in order to reveal such patterns comprehensively and potentially derive mechanisms of therapeutic pharmacological effects, both activity and connectivity have to be estimated for many brain regions. This entails the problem that hundreds of electrophysiological parameters are derived from a typically small number of subjects, making frequentist statistics ill-suited for their analysis. Methods: We here present an optimized interpretable machine-learning (ML) approach which relies on predictive power in individual recording sequences to extract and quantify the robustness of compound-induced neural changes from multi-site recordings using Shapley additive explanations (SHAP) values. To evaluate this approach, we recorded LFPs in mediodorsal thalamus (MD), prefrontal cortex (PFC), dorsal hippocampus (CA1 and CA3), and ventral hippocampus (vHC) of mice after application of amphetamine or of the dopaminergic antagonists clozapine, raclopride, or SCH23390, for which effects on directed neural communication between those brain structures were so far unknown. Results: Our approach identified complex patterns of neurophysiological changes induced by each of these compounds, which were reproducible across time intervals, doses (where tested), and ML algorithms. We found, for example, that the action of clozapine in the analysed cortico-thalamo-hippocampal network entails a larger share of D1-as opposed to D2-receptor induced effects, and that the D2-antagonist raclopride reconfigures connectivity in the delta-frequency band. Furthermore, the effects of amphetamine and clozapine were surprisingly similar in terms of decreasing thalamic input to PFC and vHC, and vHC activity, whereas an increase of dorsal-hippocampal communication and of thalamic activity distinguished amphetamine from all tested anti-dopaminergic drugs. Conclusion: Our study suggests that communication from the dorsal hippocampus scales proportionally with dopamine receptor activation and demonstrates, more generally, the high complexity of neuropharmacological effects on the circuit level. We envision that the presented approach can aid in the standardization and improved data extraction in pEEG/pLFP-studies.

19.

Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests.

Kraus, Elisabeth Barbara; Wild, Johannes; Hilbert, Sven.

Appl Psychol Meas ; 48(4-5): 167-186, 2024 Jul.

Article in English | MEDLINE | ID: mdl-39055539

ABSTRACT

This study presents a novel method to investigate test fairness and differential item functioning combining psychometrics and machine learning. Test unfairness manifests itself in systematic and demographically imbalanced influences of confounding constructs on residual variances in psychometric modeling. Our method aims to account for resulting complex relationships between response patterns and demographic attributes. Specifically, it measures the importance of individual test items, and latent ability scores in comparison to a random baseline variable when predicting demographic characteristics. We conducted a simulation study to examine the functionality of our method under various conditions such as linear and complex impact, unfairness and varying number of factors, unfair items, and varying test length. We found that our method detects unfair items as reliably as Mantel-Haenszel statistics or logistic regression analyses but generalizes to multidimensional scales in a straight forward manner. To apply the method, we used random forests to predict migration backgrounds from ability scores and single items of an elementary school reading comprehension test. One item was found to be unfair according to all proposed decision criteria. Further analysis of the item's content provided plausible explanations for this finding. Analysis code is available at: https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547.

20.

Towards Improved XAI-Based Epidemiological Research into the Next Potential Pandemic.

Khalili, Hamed; Wimmer, Maria A.

Life (Basel) ; 14(7)2024 Jun 21.

Article in English | MEDLINE | ID: mdl-39063538

ABSTRACT

By applying AI techniques to a variety of pandemic-relevant data, artificial intelligence (AI) has substantially supported the control of the spread of the SARS-CoV-2 virus. Along with this, epidemiological machine learning studies of SARS-CoV-2 have been frequently published. While these models can be perceived as precise and policy-relevant to guide governments towards optimal containment policies, their black box nature can hamper building trust and relying confidently on the prescriptions proposed. This paper focuses on interpretable AI-based epidemiological models in the context of the recent SARS-CoV-2 pandemic. We systematically review existing studies, which jointly incorporate AI, SARS-CoV-2 epidemiology, and explainable AI approaches (XAI). First, we propose a conceptual framework by synthesizing the main methodological features of the existing AI pipelines of SARS-CoV-2. Upon the proposed conceptual framework and by analyzing the selected epidemiological studies, we reflect on current research gaps in epidemiological AI toolboxes and how to fill these gaps to generate enhanced policy support in the next potential pandemic.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL