Your browser doesn't support javascript.
loading
Development of a Multivariable Model for COVID-19 Risk Stratification Based on Gradient Boosting Decision Trees
Jahir M Gutierrez; Maksims Volkovs; Tomi Poutanen; Tristan Watson; Laura Rosella.
Affiliation
  • Jahir M Gutierrez; Layer 6 AI
  • Maksims Volkovs; Layer 6 AI
  • Tomi Poutanen; Layer 6 AI
  • Tristan Watson; ICES
  • Laura Rosella; University of Toronto
Preprint in English | medRxiv | ID: ppmedrxiv-20248783
ABSTRACT
ImportancePopulation stratification of the adult population in Ontario, Canada by their risk of COVID-19 complications can support rapid pandemic response, resource allocation, and decision making. ObjectiveTo develop and validate a multivariable model to predict risk of hospitalization due to COVID-19 severity from routinely collected health records of the entire adult population of Ontario, Canada. Design, Setting, and ParticipantsThis cohort study included 36,323 adult patients (age [≥] 18 years) from the province of Ontario, Canada, who tested positive for SARS-CoV-2 nucleic acid by polymerase chain reaction between February 2 and October 5, 2020, and followed up through November 5, 2020. Patients living in long-term care facilities were excluded from the analysis. Main Outcomes and MeasuresRisk of hospitalization within 30 days of COVID-19 diagnosis was estimated via Gradient Boosting Decision Trees, and risk factor importance was examined via Shapley values. ResultsThe study cohort included 36,323 patients with majority female sex (18,895 [52.02%]) and median (IQR) age of 45 (31-58) years. The cohort had a hospitalization rate of 7.11% (2,583 hospitalizations) with median (IQR) time to hospitalization of 1 (0-5) days, and a mortality rate of 2.49% (906 deaths) with median (IQR) time to death of 12 (6-27) days. In contrast to patients who were not hospitalized, those who were hospitalized had a higher median age (64 years vs 43 years, p-value < 0.001), majority male (56.25% vs 47.35%, p-value<0.001), and had a higher median [IQR] number of comorbidities (3 [2-6] vs 1 [0-3], p-value<0.001). Patients were randomly split into development (n=29,058, 80%) and held-out validation (n=7,265, 20%) cohorts. The final Gradient Boosting model was built using the XGBoost algorithm and achieved high discrimination (development cohort mean area under the receiver operating characteristic curve across the five folds of 0.852; held-out validation cohort 0.8475) as well as excellent calibration (R2=0.998, slope=1.01, intercept=-0.01). The patients who scored at the top 10% in the validation cohort captured 47.41% of the actual hospitalizations, whereas those scored at the top 30% captured 80.56%. Patients in the held-out validation cohort (n=7,265) with a score of at least 0.5 (n=2,149, 29.58%) had a 20.29% hospitalization rate (positive predictive value 20.29%) compared with 2.2% hospitalization rate for those with a score less than 0.5 (n=5,116, 70.42%; negative predictive value 97.8%). Aside from age, gender and number of comorbidities, the features that most contribute to model predictions were history of abnormal blood levels of creatinine, neutrophils and leukocytes, geography and chronic kidney disease. ConclusionsA risk stratification model has been developed and validated using unique, de-identified, and linked routinely collected health administrative data available in Ontario, Canada. The final XGBoost model showed a high discrimination rate, with the potential utility to stratify patients at risk of serious COVID-19 outcomes. This model demonstrates that routinely collected health system data can be successfully leveraged as a proxy for the potential risk of severe COVID-19 complications. Specifically, past laboratory results and demographic factors provide a strong signal for identifying patients who are susceptible to complications. The model can support population risk stratification that informs patients protection most at risk for severe COVID-19 complications.
License
cc_no
Full text: Available Collection: Preprints Database: medRxiv Type of study: Cohort_studies / Experimental_studies / Observational study / Prognostic study / Rct Language: English Year: 2020 Document type: Preprint
Full text: Available Collection: Preprints Database: medRxiv Type of study: Cohort_studies / Experimental_studies / Observational study / Prognostic study / Rct Language: English Year: 2020 Document type: Preprint
...