Your browser doesn't support javascript.
An objective framework for evaluating unrecognized bias in medical AI models predicting COVID-19 outcomes.
Estiri, Hossein; Strasser, Zachary H; Rashidian, Sina; Klann, Jeffrey G; Wagholikar, Kavishwar B; McCoy, Thomas H; Murphy, Shawn N.
  • Estiri H; Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.
  • Strasser ZH; Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA.
  • Rashidian S; Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.
  • Klann JG; Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA.
  • Wagholikar KB; Verily Life Sciences, Boston, Massachusetts, USA.
  • McCoy TH; Massachusetts General Hospital, Boston, MA 02114, USA.
  • Murphy SN; Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.
J Am Med Inform Assoc ; 29(8): 1334-1341, 2022 07 12.
Article in English | MEDLINE | ID: covidwho-1831208
ABSTRACT

OBJECTIVE:

The increasing translation of artificial intelligence (AI)/machine learning (ML) models into clinical practice brings an increased risk of direct harm from modeling bias; however, bias remains incompletely measured in many medical AI applications. This article aims to provide a framework for objective evaluation of medical AI from multiple aspects, focusing on binary classification models. MATERIALS AND

METHODS:

Using data from over 56 000 Mass General Brigham (MGB) patients with confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), we evaluate unrecognized bias in 4 AI models developed during the early months of the pandemic in Boston, Massachusetts that predict risks of hospital admission, ICU admission, mechanical ventilation, and death after a SARS-CoV-2 infection purely based on their pre-infection longitudinal medical records. Models were evaluated both retrospectively and prospectively using model-level metrics of discrimination, accuracy, and reliability, and a novel individual-level metric for error.

RESULTS:

We found inconsistent instances of model-level bias in the prediction models. From an individual-level aspect, however, we found most all models performing with slightly higher error rates for older patients.

DISCUSSION:

While a model can be biased against certain protected groups (ie, perform worse) in certain tasks, it can be at the same time biased towards another protected group (ie, perform better). As such, current bias evaluation studies may lack a full depiction of the variable effects of a model on its subpopulations.

CONCLUSION:

Only a holistic evaluation, a diligent search for unrecognized bias, can provide enough information for an unbiased judgment of AI bias that can invigorate follow-up investigations on identifying the underlying roots of bias and ultimately make a change.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Cohort study / Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: J Am Med Inform Assoc Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Jamia

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Cohort study / Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: J Am Med Inform Assoc Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Jamia