RESUMO
Early detection of Sepsis is crucial for improving patient outcomes, as it is a significant public health concern that results in substantial morbidity and mortality. However, despite the widespread use of the Sequential Organ Failure Assessment (SOFA) in clinical settings to identify sepsis, obtaining sufficient physiological data before onset remains challenging, limiting early detection of sepsis. To address this challenge, we propose an interpretable machine learning model, ITFG (Interpretable Tree-based Feature Generation), that leverages potential correlations between features based on existing knowledge to identify sepsis within six hours of onset using valuable and continuous physiological measures. Furthermore, we introduce a Semi-supervised Attention-based Conditional Transfer Learning (SAC-TL) framework to enhance the model's generality and enable it to be used for early warning of sepsis in the target domain with less information from the source domain. Our proposed approaches effectively address the problem of systematic feature sparsity and missing data, while also being practical for different degrees of generalizability. We evaluated our proposed approaches on open datasets, MIMIC and PhysioNet, obtaining AUC of 97.98% and 86.21%, respectively, demonstrating their effectiveness in different data environments and achieving the best early detection results.
Assuntos
Sepse , Humanos , Sepse/diagnóstico , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina , Diagnóstico Precoce , Saúde PúblicaRESUMO
Background: The outbreak of coronavirus disease 2019 (COVID-19) has become a global public health concern. Many inpatients with COVID-19 have shown clinical symptoms related to sepsis, which will aggravate the deterioration of patients' condition. We aim to diagnose Viral Sepsis Caused by SARS-CoV-2 by analyzing laboratory test data of patients with COVID-19 and establish an early predictive model for sepsis risk among patients with COVID-19. Methods: This study retrospectively investigated laboratory test data of 2,453 patients with COVID-19 from electronic health records. Extreme gradient boosting (XGBoost) was employed to build four models with different feature subsets of a total of 69 collected indicators. Meanwhile, the explainable Shapley Additive ePlanation (SHAP) method was adopted to interpret predictive results and to analyze the feature importance of risk factors. Findings: The model for classifying COVID-19 viral sepsis with seven coagulation function indicators achieved the area under the receiver operating characteristic curve (AUC) 0.9213 (95% CI, 89.94-94.31%), sensitivity 97.17% (95% CI, 94.97-98.46%), and specificity 82.05% (95% CI, 77.24-86.06%). The model for identifying COVID-19 coagulation disorders with eight features provided an average of 3.68 (±) 4.60 days in advance for early warning prediction with 0.9298 AUC (95% CI, 86.91-99.04%), 82.22% sensitivity (95% CI, 67.41-91.49%), and 84.00% specificity (95% CI, 63.08-94.75%). Interpretation: We found that an abnormality of the coagulation function was related to the occurrence of sepsis and the other routine laboratory test represented by inflammatory factors had a moderate predictive value on coagulopathy, which indicated that early warning of sepsis in COVID-19 patients could be achieved by our established model to improve the patient's prognosis and to reduce mortality.