Search | VHL Regional Portal

Evaluating gender bias in ML-based clinical risk prediction models: A study on multiple use cases at different hospitals.

Cabanillas Silva, Patricia; Sun, Hong; Rodriguez, Pablo; Rezk, Mohamed; Zhang, Xianchao; Fliegenschmidt, Janis; Hulde, Nikolai; von Dossow, Vera; Meesseman, Laurent; Depraetere, Kristof; Szymanowsky, Ralph; Stieg, Jörg; Dahlweid, Fried-Michael.

J Biomed Inform ; : 104692, 2024 Jul 13.

Article in English | MEDLINE | ID: mdl-39009174

ABSTRACT

BACKGROUND: An inherent difference exists between male and female bodies, the historical under-representation of females in clinical trials widened this gap in existing healthcare data. The fairness of clinical decision-support tools is at risk when developed based on biased data. This paper aims to quantitatively assess the gender bias in risk prediction models. We aim to generalize our findings by performing this investigation on multiple use cases at different hospitals. METHODS: First, we conduct a thorough analysis of the source data to find gender-based disparities. Secondly, we assess the model performance on different gender groups at different hospitals and on different use cases. Performance evaluation is quantified using the area under the receiver-operating characteristic curve (AUROC). Lastly, we investigate the clinical implications of these biases by analyzing the underdiagnosis and overdiagnosis rate, and the decision curve analysis (DCA). We also investigate the influence of model calibration on mitigating gender-related disparities in decision-making processes. RESULTS: Our data analysis reveals notable variations in incidence rates, AUROC, and over-diagnosis rates across different genders, hospitals and clinical use cases. However, it is also observed the underdiagnosis rate is consistently higher in the female population. In general, the female population exhibits lower incidence rates and the models perform worse when applied to this group. Furthermore, the decision curve analysis demonstrates there is no statistically significant difference between the model's clinical utility across gender groups within the interested range of thresholds. CONCLUSION: The presence of gender bias within risk prediction models varies across different clinical use cases and healthcare institutions. Although inherent difference is observed between male and female populations at the data source level, this variance does not affect the parity of clinical utility. In conclusion, the evaluations conducted in this study highlight the significance of continuous monitoring of gender-based disparities in various perspectives for clinical risk prediction models.

Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.

Sun, Hong; Depraetere, Kristof; Meesseman, Laurent; Cabanillas Silva, Patricia; Szymanowsky, Ralph; Fliegenschmidt, Janis; Hulde, Nikolai; von Dossow, Vera; Vanbiervliet, Martijn; De Baerdemaeker, Jos; Roccaro-Waldmeyer, Diana M; Stieg, Jörg; Domínguez Hidalgo, Manuel; Dahlweid, Fried-Michael.

J Med Internet Res ; 24(6): e34295, 2022 06 07.

Article in English | MEDLINE | ID: mdl-35502887

ABSTRACT

BACKGROUND: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. OBJECTIVE: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. METHODS: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital's specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. RESULTS: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. CONCLUSIONS: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals.

Subject(s)

Electronic Health Records , Machine Learning , Hospitals , Humans , ROC Curve , Retrospective Studies

Exploring Novel Funding Strategies for Innovative Medical Research: The HORAO Crowdfunding Campaign.

Schucht, Philippe; Roccaro-Waldmeyer, Diana M; Murek, Michael; Zubak, Irena; Goldberg, Johannes; Falk, Stephanie; Dahlweid, Fried-Michael; Raabe, Andreas.

J Med Internet Res ; 22(11): e19715, 2020 11 11.

Article in English | MEDLINE | ID: mdl-33174857

ABSTRACT

BACKGROUND: The rise of the internet and social media has boosted online crowdfunding as a novel strategy to raise funds for kick-starting projects, but it is rarely used in science. OBJECTIVE: We report on an online crowdfunding campaign launched in the context of the neuroscience project HORAO. The aim of HORAO was to develop a noninvasive real-time method to visualize neuronal fiber tracts during brain surgery in order to better delineate tumors and to identify crucial cerebral landmarks. The revenue from the crowdfunding campaign was to be used to sponsor a crowdsourcing campaign for the HORAO project. METHODS: We ran a 7-week reward-based crowdfunding campaign on a national crowdfunding platform, offering optional material and experiential rewards in return for a contribution toward raising our target of Swiss francs (CHF) 50,000 in financial support (roughly equivalent to US $50,000 at the time of the campaign). We used various owned media (websites and social media), as well as earned media (press releases and news articles) to raise awareness about our project. RESULTS: The production of an explanatory video took 60 hours, and 31 posts were published on social media (Facebook, Instagram, and Twitter). The campaign raised a total of CHF 69,109. Approximately half of all donations came from donors who forwent a reward (CHF 28,786, 48.74%); the other half came from donors who chose experiential and material rewards in similar proportions (CHF 14,958, 25.33% and CHF 15,315.69, 25.93%, respectively). Of those with an identifiable relationship to the crowdfunding team, patients and their relatives contributed the largest sum (CHF 17,820, 30.17%), followed by friends and family (CHF 9288, 15.73%) and work colleagues (CHF 6028, 10.21%), while 43.89% of funds came from donors who were either anonymous or had an unknown relationship to the crowdfunding team. Patients and their relatives made the largest donations, with a median value of CHF 200 (IQR 90). CONCLUSIONS: Crowdfunding proved to be a successful strategy to fund a neuroscience project and to raise awareness of a specific clinical problem. Focusing on potential donors with a personal interest in the issue, such as patients and their relatives in our project, is likely to increase funding success. Compared with traditional grant applications, new skills are needed to explain medical challenges to the crowd through video messages and social media.

Subject(s)

Biomedical Research/economics , Biomedical Research/methods , Fund Raising/methods , Crowdsourcing/methods , Humans , Research Design

On the Interpretability of Artificial Intelligence in Radiology: Challenges and Opportunities.

Reyes, Mauricio; Meier, Raphael; Pereira, Sérgio; Silva, Carlos A; Dahlweid, Fried-Michael; von Tengg-Kobligk, Hendrik; Summers, Ronald M; Wiest, Roland.

Radiol Artif Intell ; 2(3): e190043, 2020 May 27.

Article in English | MEDLINE | ID: mdl-32510054

ABSTRACT

As artificial intelligence (AI) systems begin to make their way into clinical radiology practice, it is crucial to assure that they function correctly and that they gain the trust of experts. Toward this goal, approaches to make AI "interpretable" have gained attention to enhance the understanding of a machine learning algorithm, despite its complexity. This article aims to provide insights into the current state of the art of interpretability methods for radiology AI. This review discusses radiologists' opinions on the topic and suggests trends and challenges that need to be addressed to effectively streamline interpretability methods in clinical practice. Supplemental material is available for this article. © RSNA, 2020 See also the commentary by Gastounioti and Kontos in this issue.

Volumetric Food Quantification Using Computer Vision on a Depth-Sensing Smartphone: Preclinical Study.

Herzig, David; Nakas, Christos T; Stalder, Janine; Kosinski, Christophe; Laesser, Céline; Dehais, Joachim; Jaeggi, Raphael; Leichtle, Alexander Benedikt; Dahlweid, Fried-Michael; Stettler, Christoph; Bally, Lia.

JMIR Mhealth Uhealth ; 8(3): e15294, 2020 03 25.

Article in English | MEDLINE | ID: mdl-32209531

ABSTRACT

BACKGROUND: Quantification of dietary intake is key to the prevention and management of numerous metabolic disorders. Conventional approaches are challenging, laborious, and lack accuracy. The recent advent of depth-sensing smartphones in conjunction with computer vision could facilitate reliable quantification of food intake. OBJECTIVE: The objective of this study was to evaluate the accuracy of a novel smartphone app combining depth-sensing hardware with computer vision to quantify meal macronutrient content using volumetry. METHODS: The app ran on a smartphone with a built-in depth sensor applying structured light (iPhone X). The app estimated weight, macronutrient (carbohydrate, protein, fat), and energy content of 48 randomly chosen meals (breakfasts, cooked meals, snacks) encompassing 128 food items. The reference weight was generated by weighing individual food items using a precision scale. The study endpoints were (1) error of estimated meal weight, (2) error of estimated meal macronutrient content and energy content, (3) segmentation performance, and (4) processing time. RESULTS: In both absolute and relative terms, the mean (SD) absolute errors of the app's estimates were 35.1 g (42.8 g; relative absolute error: 14.0% [12.2%]) for weight; 5.5 g (5.1 g; relative absolute error: 14.8% [10.9%]) for carbohydrate content; 1.3 g (1.7 g; relative absolute error: 12.3% [12.8%]) for fat content; 2.4 g (5.6 g; relative absolute error: 13.0% [13.8%]) for protein content; and 41.2 kcal (42.5 kcal; relative absolute error: 12.7% [10.8%]) for energy content. Although estimation accuracy was not affected by the viewing angle, the type of meal mattered, with slightly worse performance for cooked meals than for breakfasts and snacks. Segmentation adjustment was required for 7 of the 128 items. Mean (SD) processing time across all meals was 22.9 seconds (8.6 seconds). CONCLUSIONS: This study evaluated the accuracy of a novel smartphone app with an integrated depth-sensing camera and found highly accurate volume estimation across a broad range of food items. In addition, the system demonstrated high segmentation performance and low processing time, highlighting its usability.

Subject(s)

Smartphone , Computers , Eating , Humans , Nutrients

Dehydroepiandrosterone (DHEA) modulates the activity and the expression of lymphocyte subpopulations induced by cecal ligation and puncture.

van Griensven, Martijn; Dahlweid, Fried Michael; Giannoudis, Peter V; Wittwer, Tobias; Böttcher, Frederic; Breddin, Maike; Pape, Hans-Christoph.

Shock ; 18(5): 445-9, 2002 Nov.

Article in English | MEDLINE | ID: mdl-12412624

ABSTRACT

Dehydroepiandrosterone (DHEA) exerts a variety of positive effects on the immunologic alterations after trauma and sepsis. We therefore measured the therapeutic efficacy of DHEA after cecal ligation and puncture (CLP) on the expression of lymphocyte subpopulations and on the delayed type hypersensitivity (DTH) reaction. Male NMRI-mice were randomly assigned to four different treatment groups. Treatment consisted of DHEA or saline (S) administration after CLP or laparotomy only. Flow cytometry was performed (CD4+, CD8+, and CD56 lymphocytes) after 96 hours. DTH-reaction, activity and mortality rate were documented. The CLP-induced reduction in activity and survival (mortality: 34/40) was significantly (p < 0.03) less sustained in CLP-DHEA (mortality: 22/40). The DTH-ratio (before vs. after secondary challenge) was significantly lowered in CLP-S (1.01 +/- 0.15) compared to CLP-DHEA (1.35 +/- 0.1) after 48 hours (p < 0.01). CLP-DHEA (22.2 +/- 7.9%) was associated with a statistically significant less sustained increase of CD56+ cells (p < 0.01) compared with CLP-S (49.0 +/- 6.9%). DHEA-treatment after CLP was associated with less reduction in the CD8+ T-lymphocyte subsets (p < 0.01 vs. all other groups). DHEA treatment after CLP was associated with fewer alterations in the changes of CD8+ and CD56, cells, and the DTH reaction compared with animals submitted to CLP without any treatment. This difference was associated with improved outcome (reactivity, mortality). These results suggest a modulation at specific immune reactions by DHEA treatment.

Subject(s)

Dehydroepiandrosterone/pharmacology , Lymphocyte Subsets/drug effects , Lymphocyte Subsets/immunology , Sepsis/drug therapy , Sepsis/immunology , Adjuvants, Immunologic/pharmacology , Animals , CD8-Positive T-Lymphocytes/drug effects , CD8-Positive T-Lymphocytes/immunology , Disease Models, Animal , Hypersensitivity, Delayed , Male , Mice

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL