Your browser doesn't support javascript.
Use of feature engineering to predict COVID-19 mortality
American Journal of Respiratory and Critical Care Medicine ; 203(9), 2021.
Article in English | EMBASE | ID: covidwho-1277431
ABSTRACT
Rationale Defining a reliable prognostication method in patients with COVID-19 has remained a challenge. Various combinations of inflammatory markers, including CRP, LDH, and D-dimer, have been predictive of increased severity in this group of patients. None of the markers mentioned, however, have had a significant association with increased mortality. Machine learning has been utilized for predictions related to COVID-19. Prior COVID-19 machine learning models used the original features as the input, but we hypothesize that the model can be improved via synthesis of new features by utilizing feature engineering. We aim to explore the predictive capacities of generated features and evaluate for improvements in COVID-19 mortality prediction.

Methods:

With the approval of the hospital Institutional Review Board, medical records of two hundred sixty-nine patients with a positive COVID-19 PCR study in two 350-bed medical centers were analyzed retrospectively from March 22nd through May 10th, 2020. One hundred sixty-six variables, including laboratory studies, vital signs, demographics, and comorbidities, were collected in total. Features with greater than 50 percent missing values were dropped. Missing data was imputed with SKlearn Multiple Imputation. Feature selection was performed using sequential feature selection via the machine learning extensions library (MLxtend), which led to a final feature space of seven. Feature engineering was performed using the seven features and four additional features generated. LightGBM was chosen as our classification model. The results were compared between the feature engineering and base datasets. Feature ranking was performed using SHapley Additive exPlanations (SHAP). Partial dependence plots were generated to determine feature value cutoffs that predict increased mortality.

Results:

LightGBM demonstrated good classification performance with an Area Under the Curve (AUC) of .9 in the base model. The feature engineering group had an increase in AUC to .94. The feature most predictive of COVID-19 mortality based upon the SHAP plot was the product of Maximum Blood Urea Nitrogen and Maximum Respiratory Rate (MaxBUN∗MaxRR). The partial dependence plot demonstrates that at a MaxBUN∗MaxRR value > 1000 there is a rise in SHAP values which denotes a rise in predicted mortality.

Conclusion:

The use of feature engineering improved predictive performance for mortality related to COVID-19. The strongest feature for the prediction of mortality was MaxBUN∗MaxRR. A sharp rise in predicted mortality was observed when the product of these values exceeded 1000. Feature engineering can be used to improve existing mortality prediction models.

Full text: Available Collection: Databases of international organizations Database: EMBASE Type of study: Prognostic study Language: English Journal: American Journal of Respiratory and Critical Care Medicine Year: 2021 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: EMBASE Type of study: Prognostic study Language: English Journal: American Journal of Respiratory and Critical Care Medicine Year: 2021 Document Type: Article