Quantifying representativeness in randomized clinical trials using machine learning fairness metrics.

Qi, Miao; Cahan, Owen; Foreman, Morgan A; Gruen, Daniel M; Das, Amar K; Bennett, Kristin P

Qi, Miao; Cahan, Owen; Foreman, Morgan A; Gruen, Daniel M; Das, Amar K; Bennett, Kristin P.

Qi M; Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, USA.
Cahan O; Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA.
Foreman MA; Center for Computational Health, IBM Research, Cambridge, Massachusetts, USA.
Gruen DM; Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA.
Das AK; Center for Computational Health, IBM Research, Cambridge, Massachusetts, USA.
Bennett KP; Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, USA.

JAMIA Open ; 4(3): ooab077, 2021 Jul.

Article in English | MEDLINE | ID: covidwho-1584263

ABSTRACT

ABSTRACT

OBJECTIVE:

We help identify subpopulations underrepresented in randomized clinical trials (RCTs) cohorts with respect to national, community-based or health system target populations by formulating population representativeness of RCTs as a machine learning (ML) fairness problem, deriving new representation metrics, and deploying them in easy-to-understand interactive visualization tools. MATERIALS AND

METHODS:

We represent RCT cohort enrollment as random binary classification fairness problems, and then show how ML fairness metrics based on enrollment fraction can be efficiently calculated using easily computed rates of subpopulations in RCT cohorts and target populations. We propose standardized versions of these metrics and deploy them in an interactive tool to analyze 3 RCTs with respect to type 2 diabetes and hypertension target populations in the National Health and Nutrition Examination Survey.

RESULTS:

We demonstrate how the proposed metrics and associated statistics enable users to rapidly examine representativeness of all subpopulations in the RCT defined by a set of categorical traits (eg, gender, race, ethnicity, smoking status, and blood pressure) with respect to target populations.

DISCUSSION:

The normalized metrics provide an intuitive standardized scale for evaluating representation across subgroups, which may have vastly different enrollment fractions and rates in RCT study cohorts. The metrics are beneficial complements to other approaches (eg, enrollment fractions) used to identify generalizability and health equity of RCTs.

CONCLUSION:

By quantifying the gaps between RCT and target populations, the proposed methods can support generalizability evaluation of existing RCT cohorts. The interactive visualization tool can be readily applied to identified underrepresented subgroups with respect to any desired source or target populations.

Keywords

health equity; machine learning; population representativeness; randomized clinical trials; subgroup

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Type of study: Cohort study / Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Language: English Journal: JAMIA Open Year: 2021 Document Type: Article Affiliation country: Jamiaopen

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google