Pesquisa | Portal Regional da BVS (teste)

Evaluating gender bias in ML-based clinical risk prediction models: A study on multiple use cases at different hospitals.

Cabanillas Silva, Patricia; Sun, Hong; Rodriguez, Pablo; Rezk, Mohamed; Zhang, Xianchao; Fliegenschmidt, Janis; Hulde, Nikolai; von Dossow, Vera; Meesseman, Laurent; Depraetere, Kristof; Szymanowsky, Ralph; Stieg, Jörg; Dahlweid, Fried-Michael.

J Biomed Inform ; : 104692, 2024 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-39009174

RESUMO

BACKGROUND: An inherent difference exists between male and female bodies, the historical under-representation of females in clinical trials widened this gap in existing healthcare data. The fairness of clinical decision-support tools is at risk when developed based on biased data. This paper aims to quantitatively assess the gender bias in risk prediction models. We aim to generalize our findings by performing this investigation on multiple use cases at different hospitals. METHODS: First, we conduct a thorough analysis of the source data to find gender-based disparities. Secondly, we assess the model performance on different gender groups at different hospitals and on different use cases. Performance evaluation is quantified using the area under the receiver-operating characteristic curve (AUROC). Lastly, we investigate the clinical implications of these biases by analyzing the underdiagnosis and overdiagnosis rate, and the decision curve analysis (DCA). We also investigate the influence of model calibration on mitigating gender-related disparities in decision-making processes. RESULTS: Our data analysis reveals notable variations in incidence rates, AUROC, and over-diagnosis rates across different genders, hospitals and clinical use cases. However, it is also observed the underdiagnosis rate is consistently higher in the female population. In general, the female population exhibits lower incidence rates and the models perform worse when applied to this group. Furthermore, the decision curve analysis demonstrates there is no statistically significant difference between the model's clinical utility across gender groups within the interested range of thresholds. CONCLUSION: The presence of gender bias within risk prediction models varies across different clinical use cases and healthcare institutions. Although inherent difference is observed between male and female populations at the data source level, this variance does not affect the parity of clinical utility. In conclusion, the evaluations conducted in this study highlight the significance of continuous monitoring of gender-based disparities in various perspectives for clinical risk prediction models.

Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.

Sun, Hong; Depraetere, Kristof; Meesseman, Laurent; Cabanillas Silva, Patricia; Szymanowsky, Ralph; Fliegenschmidt, Janis; Hulde, Nikolai; von Dossow, Vera; Vanbiervliet, Martijn; De Baerdemaeker, Jos; Roccaro-Waldmeyer, Diana M; Stieg, Jörg; Domínguez Hidalgo, Manuel; Dahlweid, Fried-Michael.

J Med Internet Res ; 24(6): e34295, 2022 06 07.

Artigo em Inglês | MEDLINE | ID: mdl-35502887

RESUMO

BACKGROUND: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. OBJECTIVE: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. METHODS: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital's specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. RESULTS: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. CONCLUSIONS: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Hospitais , Humanos , Curva ROC , Estudos Retrospectivos

A scalable approach for developing clinical risk prediction applications in different hospitals.

Sun, Hong; Depraetere, Kristof; Meesseman, Laurent; De Roo, Jos; Vanbiervliet, Martijn; De Baerdemaeker, Jos; Muys, Herman; von Dossow, Vera; Hulde, Nikolai; Szymanowsky, Ralph.

J Biomed Inform ; 118: 103783, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-33887456

RESUMO

OBJECTIVE: Machine learning (ML) algorithms are now widely used in predicting acute events for clinical applications. While most of such prediction applications are developed to predict the risk of a particular acute event at one hospital, few efforts have been made in extending the developed solutions to other events or to different hospitals. We provide a scalable solution to extend the process of clinical risk prediction model development of multiple diseases and their deployment in different Electronic Health Records (EHR) systems. MATERIALS AND METHODS: We defined a generic process for clinical risk prediction model development. A calibration tool has been created to automate the model generation process. We applied the model calibration process at four hospitals, and generated risk prediction models for delirium, sepsis and acute kidney injury (AKI) respectively at each of these hospitals. RESULTS: The delirium risk prediction models have on average an area under the receiver-operating characteristic curve (AUROC) of 0.82 at admission and 0.95 at discharge on the test datasets of the four hospitals. The sepsis models have on average an AUROC of 0.88 and 0.95, and the AKI models have on average an AUROC of 0.85 and 0.92, at the day of admission and discharge respectively. DISCUSSION: The scalability discussed in this paper is based on building common data representations (syntactic interoperability) between EHRs stored in different hospitals. Semantic interoperability, a more challenging requirement that different EHRs share the same meaning of data, e.g. a same lab coding system, is not mandated with our approach. CONCLUSIONS: Our study describes a method to develop and deploy clinical risk prediction models in a scalable way. We demonstrate its feasibility by developing risk prediction models for three diseases across four hospitals.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Hospitalização , Hospitais , Humanos , Curva ROC

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA