Your browser doesn't support javascript.
loading
Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.
Schaekermann, Mike; Spitz, Terry; Pyles, Malcolm; Cole-Lewis, Heather; Wulczyn, Ellery; Pfohl, Stephen R; Martin, Donald; Jaroensri, Ronnachai; Keeling, Geoff; Liu, Yuan; Farquhar, Stephanie; Xue, Qinghan; Lester, Jenna; Hughes, Cían; Strachan, Patricia; Tan, Fraser; Bui, Peggy; Mermel, Craig H; Peng, Lily H; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Virmani, Sunny; Semturs, Christopher; Liu, Yun; Horn, Ivor; Cameron Chen, Po-Hsuan.
Afiliación
  • Schaekermann M; Google Health, Mountain View, CA, USA.
  • Spitz T; Google Health, Mountain View, CA, USA.
  • Pyles M; Advanced Clinical, Deerfield, IL, USA.
  • Cole-Lewis H; Department of Dermatology, Cleveland Clinic, Cleveland, OH, USA.
  • Wulczyn E; Google Health, Mountain View, CA, USA.
  • Pfohl SR; Google Health, Mountain View, CA, USA.
  • Martin D; Google Health, Mountain View, CA, USA.
  • Jaroensri R; Google Health, Mountain View, CA, USA.
  • Keeling G; Google Health, Mountain View, CA, USA.
  • Liu Y; Google Health, Mountain View, CA, USA.
  • Farquhar S; Google Health, Mountain View, CA, USA.
  • Xue Q; Google Health, Mountain View, CA, USA.
  • Lester J; Google Health, Mountain View, CA, USA.
  • Hughes C; Advanced Clinical, Deerfield, IL, USA.
  • Strachan P; Department of Dermatology, University of California, San Francisco, CA, USA.
  • Tan F; Google Health, Mountain View, CA, USA.
  • Bui P; Google Health, Mountain View, CA, USA.
  • Mermel CH; Google Health, Mountain View, CA, USA.
  • Peng LH; Google Health, Mountain View, CA, USA.
  • Matias Y; Google Health, Mountain View, CA, USA.
  • Corrado GS; Google Health, Mountain View, CA, USA.
  • Webster DR; Google Health, Mountain View, CA, USA.
  • Virmani S; Google Health, Mountain View, CA, USA.
  • Semturs C; Google Health, Mountain View, CA, USA.
  • Liu Y; Google Health, Mountain View, CA, USA.
  • Horn I; Google Health, Mountain View, CA, USA.
  • Cameron Chen PH; Google Health, Mountain View, CA, USA.
EClinicalMedicine ; 70: 102479, 2024 Apr.
Article en En | MEDLINE | ID: mdl-38685924
ABSTRACT

Background:

Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study.

Methods:

Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.

Findings:

Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.

Interpretation:

Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.

Funding:

Google LLC.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: EClinicalMedicine Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: EClinicalMedicine Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos