Search | VHL Regional Portal

1.

Intrapartum electronic fetal heart rate monitoring to predict acidemia at birth with the use of deep learning.

McCoy, Jennifer A; Levine, Lisa D; Wan, Guangya; Chivers, Corey; Teel, Joseph; La Cava, William G.

Am J Obstet Gynecol ; 2024 Apr 24.

Article in English | MEDLINE | ID: mdl-38663662

ABSTRACT

BACKGROUND: Electronic fetal monitoring is used in most US hospital births but has significant limitations in achieving its intended goal of preventing intrapartum hypoxic-ischemic injury. Novel deep learning techniques can improve complex data processing and pattern recognition in medicine. OBJECTIVE: This study aimed to apply deep learning approaches to develop and validate a model to predict fetal acidemia from electronic fetal monitoring data. STUDY DESIGN: The database was created using intrapartum electronic fetal monitoring data from 2006 to 2020 from a large, multisite academic health system. Data were divided into training and testing sets with equal distribution of acidemic cases. Several different deep learning architectures were explored. The primary outcome was umbilical artery acidemia, which was investigated at 4 clinically meaningful thresholds: 7.20, 7.15, 7.10, and 7.05, along with base excess. The receiver operating characteristic curves were generated with the area under the receiver operating characteristic assessed to determine the performance of the models. External validation was performed using a publicly available Czech database of electronic fetal monitoring data. RESULTS: A total of 124,777 electronic fetal monitoring files were available, of which 77,132 had <30% missingness in the last 60 minutes of the electronic fetal monitoring tracing. Of these, 21,041 were matched to a corresponding umbilical cord gas result, of which 10,182 were time-stamped within 30 minutes of the last electronic fetal monitoring reading and composed the final dataset. The prevalence rates of the outcomes in the data were 20.9% with a pH of <7.2, 9.1% with a pH of <7.15, 3.3% with a pH of <7.10, and 1.3% with a pH of <7.05. The best performing model achieved an area under the receiver operating characteristic of 0.85 at a pH threshold of <7.05. When predicting the joint outcome of both pH of <7.05 and base excess of less than -10 meq/L, an area under the receiver operating characteristic of 0.89 was achieved. When predicting both pH of <7.20 and base excess of less than -10 meq/L, an area under the receiver operating characteristic of 0.87 was achieved. At a pH of <7.15 and a positive predictive value of 30%, the model achieved a sensitivity of 90% and a specificity of 48%. CONCLUSION: The application of deep learning methods to intrapartum electronic fetal monitoring analysis achieves promising performance in predicting fetal acidemia. This technology could help improve the accuracy and consistency of electronic fetal monitoring interpretation.

2.

Pediatric ECG-Based Deep Learning to Predict Left Ventricular Dysfunction and Remodeling.

Mayourian, Joshua; La Cava, William G; Vaid, Akhil; Nadkarni, Girish N; Ghelani, Sunil J; Mannix, Rebekah; Geva, Tal; Dionne, Audrey; Alexander, Mark E; Duong, Son Q; Triedman, John K.

Circulation ; 149(12): 917-931, 2024 03 19.

Article in English | MEDLINE | ID: mdl-38314583

ABSTRACT

BACKGROUND: Artificial intelligence-enhanced ECG analysis shows promise to detect ventricular dysfunction and remodeling in adult populations. However, its application to pediatric populations remains underexplored. METHODS: A convolutional neural network was trained on paired ECG-echocardiograms (≤2 days apart) from patients ≤18 years of age without major congenital heart disease to detect human expert-classified greater than mild left ventricular (LV) dysfunction, hypertrophy, and dilation (individually and as a composite outcome). Model performance was evaluated on single ECG-echocardiogram pairs per patient at Boston Children's Hospital and externally at Mount Sinai Hospital using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). RESULTS: The training cohort comprised 92 377 ECG-echocardiogram pairs (46 261 patients; median age, 8.2 years). Test groups included internal testing (12 631 patients; median age, 8.8 years; 4.6% composite outcomes), emergency department (2830 patients; median age, 7.7 years; 10.0% composite outcomes), and external validation (5088 patients; median age, 4.3 years; 6.1% composite outcomes) cohorts. Model performance was similar on internal test and emergency department cohorts, with model predictions of LV hypertrophy outperforming the pediatric cardiologist expert benchmark. Adding age and sex to the model added no benefit to model performance. When using quantitative outcome cutoffs, model performance was similar between internal testing (composite outcome: AUROC, 0.88, AUPRC, 0.43; LV dysfunction: AUROC, 0.92, AUPRC, 0.23; LV hypertrophy: AUROC, 0.88, AUPRC, 0.28; LV dilation: AUROC, 0.91, AUPRC, 0.47) and external validation (composite outcome: AUROC, 0.86, AUPRC, 0.39; LV dysfunction: AUROC, 0.94, AUPRC, 0.32; LV hypertrophy: AUROC, 0.84, AUPRC, 0.25; LV dilation: AUROC, 0.87, AUPRC, 0.33), with composite outcome negative predictive values of 99.0% and 99.2%, respectively. Saliency mapping highlighted ECG components that influenced model predictions (precordial QRS complexes for all outcomes; T waves for LV dysfunction). High-risk ECG features include lateral T-wave inversion (LV dysfunction), deep S waves in V1 and V2 and tall R waves in V6 (LV hypertrophy), and tall R waves in V4 through V6 (LV dilation). CONCLUSIONS: This externally validated algorithm shows promise to inexpensively screen for LV dysfunction and remodeling in children, which may facilitate improved access to care by democratizing the expertise of pediatric cardiologists.

Subject(s)

Deep Learning , Ventricular Dysfunction, Left , Adult , Humans , Child , Child, Preschool , Electrocardiography , Artificial Intelligence , Ventricular Dysfunction, Left/diagnostic imaging , Hypertrophy, Left Ventricular/diagnostic imaging

3.

Effects of Race and Gender Classifications on Atherosclerotic Cardiovascular Disease Risk Estimates for Clinical Decision-Making in a Cohort of Black Transgender Women.

Poteat, Tonia; Lett, Elle; Rich, Ashleigh J; Jiang, Huijun; Wirtz, Andrea L; Radix, Asa; Reisner, Sari L; Harris, Alexander B; Malone, Jowanna; La Cava, William G; Lesko, Catherine R; Mayer, Kenneth H; Streed, Carl G.

Health Equity ; 7(1): 803-808, 2023.

Article in English | MEDLINE | ID: mdl-38076214

ABSTRACT

Introduction: Despite their dynamic, socially constructed, and imprecise nature, both race and gender are included in common risk calculators used for clinical decision-making about statin therapy for atherosclerotic cardiovascular disease (ASCVD) prevention. Methods and Materials: We assessed the effect of manipulating six different race-gender categories on ASCVD risk scores among 90 Black transgender women. Results: Risk scores varied by operationalization of race and gender and affected the proportion for whom statins were recommended. Discussion: Race and gender are social constructs underpinning racialized and gendered health inequities. Their rote use in ASCVD risk calculators may reinforce and perpetuate existing inequities.

4.

Translating Intersectionality to Fair Machine Learning in Health Sciences.

Lett, Elle; La Cava, William G.

Nat Mach Intell ; 5(5): 476-479, 2023 May.

Article in English | MEDLINE | ID: mdl-37600144

ABSTRACT

Fairness approaches in machine learning should involve more than assessment of performance metrics across groups. Shifting the focus away from model metrics, we reframe fairness through the lens of intersectionality, a Black feminist theoretical framework that contextualizes individuals in interacting systems of power and oppression.

5.

Fair admission risk prediction with proportional multicalibration.

La Cava, William G; Lett, Elle; Wan, Guangya.

Proc Mach Learn Res ; 209: 350-378, 2023.

Article in English | MEDLINE | ID: mdl-37576024

ABSTRACT

Fair calibration is a widely desirable fairness criteria in risk prediction contexts. One way to measure and achieve fair calibration is with multicalibration. Multicalibration constrains calibration error among flexibly-defined subpopulations while maintaining overall calibration. However, multicalibrated models can exhibit a higher percent calibration error among groups with lower base rates than groups with higher base rates. As a result, it is possible for a decision-maker to learn to trust or distrust model predictions for specific groups. To alleviate this, we propose proportional multicalibration, a criteria that constrains the percent calibration error among groups and within prediction bins. We prove that satisfying proportional multicalibration bounds a model's multicalibration as well its differential calibration, a fairness criteria that directly measures how closely a model approximates sufficiency. Therefore, proportionally calibrated models limit the ability of decision makers to distinguish between model performance on different patient groups, which may make the models more trustworthy in practice. We provide an efficient algorithm for post-processing risk prediction models for proportional multicalibration and evaluate it empirically. We conduct simulation studies and investigate a real-world application of PMC-postprocessing to prediction of emergency department patient admissions. We observe that proportional multicalibration is a promising criteria for controlling simultaneous measures of calibration fairness of a model over intersectional groups with virtually no cost in terms of classification performance.

6.

A flexible symbolic regression method for constructing interpretable clinical prediction models.

La Cava, William G; Lee, Paul C; Ajmal, Imran; Ding, Xiruo; Solanki, Priyanka; Cohen, Jordana B; Moore, Jason H; Herman, Daniel S.

NPJ Digit Med ; 6(1): 107, 2023 Jun 05.

Article in English | MEDLINE | ID: mdl-37277550

ABSTRACT

Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10-6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10-6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.

7.

Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?

Tan, Amelia L M; Getzen, Emily J; Hutch, Meghan R; Strasser, Zachary H; Gutiérrez-Sacristán, Alba; Le, Trang T; Dagliati, Arianna; Morris, Michele; Hanauer, David A; Moal, Bertrand; Bonzel, Clara-Lea; Yuan, William; Chiudinelli, Lorenzo; Das, Priam; Zhang, Harrison G; Aronow, Bruce J; Avillach, Paul; Brat, Gabriel A; Cai, Tianxi; Hong, Chuan; La Cava, William G; Hooi Will Loh, He; Luo, Yuan; Murphy, Shawn N; Yuan Hgiam, Kee; Omenn, Gilbert S; Patel, Lav P; Jebathilagam Samayamuthu, Malarkodi; Shriver, Emily R; Shakeri Hossein Abad, Zahra; Tan, Byorn W L; Visweswaran, Shyam; Wang, Xuan; Weber, Griffin M; Xia, Zongqi; Verdy, Bertrand; Long, Qi; Mowery, Danielle L; Holmes, John H.

J Biomed Inform ; 139: 104306, 2023 03.

Article in English | MEDLINE | ID: mdl-36738870

ABSTRACT

BACKGROUND: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as ââreflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS: We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS: With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION: In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.

Subject(s)

COVID-19 , Electronic Health Records , Humans , Data Collection , Records , Cluster Analysis

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL