Your browser doesn't support javascript.
The importance of being external. methodological insights for the external validation of machine learning models in medicine.
Cabitza, Federico; Campagner, Andrea; Soares, Felipe; García de Guadiana-Romualdo, Luis; Challa, Feyissa; Sulejmani, Adela; Seghezzi, Michela; Carobene, Anna.
  • Cabitza F; University of Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy. Electronic address: federico.cabitza@unimib.it.
  • Campagner A; University of Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy.
  • Soares F; Department of Industrial Engineering - Universidade Federal do Rio Grande do Sul. Porto Alegre, Brazil.
  • García de Guadiana-Romualdo L; Laboratory Medicine Department, Hospital Universitario Santa Lucia, Cartagena, Spain.
  • Challa F; National Reference Laboratory for Clinical Chemistry, Ethiopian Public Health Institute, Addis Ababa, Ethiopia.
  • Sulejmani A; Laboratorio di chimica clinica, Ospedale di Desio e Monza, ASST-Monza, Dipartimento di medicina e chirurgia, Universit di Milano-Bicocca, Monza, Italy.
  • Seghezzi M; Laboratorio di chimica clinica, Ospedale Papa Giovanni XXIII, Bergamo, Italy.
  • Carobene A; Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy.
Comput Methods Programs Biomed ; 208: 106288, 2021 Sep.
Article in English | MEDLINE | ID: covidwho-1322048
ABSTRACT
Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models.

METHODS:

We fill this gap by proposing a meta-validation method, to assess the soundness of EV procedures. In doing so, we complement the usual way to assess EV by considering both dataset cardinality, and the similarity of the EV dataset with respect to the training set. We then investigate how the notions of cardinality and similarity can be used to inform on the reliability of a validation procedure, by integrating them into two summative data visualizations.

RESULTS:

We illustrate our methodology by applying it to the validation of a state-of-the-art COVID-19 diagnostic model on 8 EV sets, collected across 3 different continents. The model performance was moderately impacted by data similarity (Pearson ρ = 0.38, p< 0.001). In the EV, the validated model reported good AUC (average 0.84), acceptable calibration (average 0.17) and utility (average 0.50). The validation datasets were adequate in terms of dataset cardinality and similarity, thus suggesting the soundness of the results. We also provide a qualitative guideline to evaluate the reliability of validation procedures, and we discuss the importance of proper external validation in light of the obtained results.

CONCLUSIONS:

In this paper, we propose a novel, lean methodology to 1) study how the similarity between training and validation sets impacts the generalizability of a ML model; 2) assess the soundness of EV evaluations along three complementary performance dimensions discrimination, utility and calibration; 3) draw conclusions on the robustness of the model under validation. We applied this methodology to a state-of-the-art model for the diagnosis of COVID-19 from routine blood tests, and showed how to interpret the results in light of the presented framework.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Cohort study / Diagnostic study / Experimental Studies / Observational study / Prognostic study / Qualitative research / Reviews Limits: Humans Language: English Journal: Comput Methods Programs Biomed Journal subject: Medical Informatics Year: 2021 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Cohort study / Diagnostic study / Experimental Studies / Observational study / Prognostic study / Qualitative research / Reviews Limits: Humans Language: English Journal: Comput Methods Programs Biomed Journal subject: Medical Informatics Year: 2021 Document Type: Article