Search | VHL Regional Portal

1.

Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER program.

Varma, Jay K; Zang, Chengxi; Carton, Thomas W; Block, Jason P; Khullar, Dhruv J; Zhang, Yongkang; Weiner, Mark G; Rothman, Russell L; Schenck, Edward J; Xu, Zhenxing; Lyman, Kristin; Bian, Jiang; Xu, Jie; Shenkman, Elizabeth A; Maughan, Christine; Castro-Baucom, Leah; O'Brien, Lisa; Wang, Fei; Kaushal, Rainu.

PLoS One ; 19(6): e0282451, 2024.

Article in English | MEDLINE | ID: mdl-38843159

ABSTRACT

IMPORTANCE: The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. OBJECTIVE: To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. DESIGN: Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. SETTING: Healthcare facilities in New York and Florida. PARTICIPANTS: Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. EXPOSURE: Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. MAIN OUTCOME(S) AND MEASURE(S): Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31-180 days after a positive COVID-19 test compared to persons without a COVID-19 test or diagnosis during the 31-180 days after the last negative test. RESULTS: We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those without a COVID-19 test or diagnosis (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). CONCLUSIONS AND RELEVANCE: We documented a substantial relative risk of pulmonary embolism and a large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection.

Subject(s)

COVID-19 , Electronic Health Records , SARS-CoV-2 , Humans , COVID-19/epidemiology , COVID-19/diagnosis , Female , Male , Middle Aged , SARS-CoV-2/isolation & purification , Retrospective Studies , Adult , Aged , United States/epidemiology , Post-Acute COVID-19 Syndrome , Florida/epidemiology , Cohort Studies

2.

Association between acquiring SARS-CoV-2 during pregnancy and post-acute sequelae of SARS-CoV-2 infection: RECOVER electronic health record cohort analysis.

Bruno, Ann M; Zang, Chengxi; Xu, Zhengxing; Wang, Fei; Weiner, Mark G; Guthe, Nick; Fitzgerald, Megan; Kaushal, Rainu; Carton, Thomas W; Metz, Torri D.

EClinicalMedicine ; 73: 102654, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38828129

ABSTRACT

Background: Little is known about post-acute sequelae of SARS-CoV-2 infection (PASC) after acquiring SARS-CoV-2 infection during pregnancy. We aimed to evaluate the association between acquiring SARS-CoV-2 during pregnancy compared with acquiring SARS-CoV-2 outside of pregnancy and the development of PASC. Methods: This retrospective cohort study from the Researching COVID to Enhance Recovery (RECOVER) Initiative Patient-Centred Clinical Research Network (PCORnet) used electronic health record (EHR) data from 19 U.S. health systems. Females aged 18-49 years with lab-confirmed SARS-CoV-2 infection from March 2020 through June 2022 were included. Validated algorithms were used to identify pregnancies with a delivery at >20 weeks' gestation. The primary outcome was PASC, as previously defined by computable phenotype in the adult non-pregnant PCORnet EHR dataset, identified 30-180 days post-SARS-CoV-2 infection. Secondary outcomes were the 24 component diagnoses contributing to the PASC phenotype definition. Univariable comparisons were made for baseline characteristics between individuals with SARS-CoV-2 infection acquired during pregnancy compared with outside of pregnancy. Using inverse probability of treatment weighting to adjust for baseline differences, the association between SARS-CoV-2 infection acquired during pregnancy and the selected outcomes was modelled. The incident risk is reported as the adjusted hazard ratio (aHR) with 95% confidence intervals. Findings: In total, 83,915 females with SARS-CoV-2 infection acquired outside of pregnancy and 5397 females with SARS-CoV-2 infection acquired during pregnancy were included in analysis. Non-pregnant females with SARS-CoV-2 infection were more likely to be older and have comorbid health conditions. SARS-CoV-2 infection acquired in pregnancy as compared with acquired outside of pregnancy was associated with a lower incidence of PASC (25.5% vs 33.9%; aHR 0.85, 95% CI 0.80-0.91). SARS-CoV-2 infection acquired in pregnant females was associated with increased risk for some PASC component diagnoses including abnormal heartbeat (aHR 1.67, 95% CI 1.43-1.94), abdominal pain (aHR 1.34, 95% CI 1.16-1.55), and thromboembolism (aHR 1.88, 95% CI 1.17-3.04), but decreased risk for other diagnoses including malaise (aHR 0.35, 95% CI 0.27-0.47), pharyngitis (aHR 0.36, 95% CI 0.26-0.48) and cognitive problems (aHR 0.39, 95% CI 0.27-0.56). Interpretation: SARS-CoV-2 infection acquired during pregnancy was associated with lower risk of development of PASC at 30-180 days after incident SARS-CoV-2 infection in this nationally representative sample. These findings may be used to counsel pregnant and pregnant capable individuals, and direct future prospective study. Funding: National Institutes of Health (NIH) Other Transaction Agreement (OTA) OT2HL16184.

3.

Long COVID incidence in adults and children between 2020 and 2023: a real-world data study from the RECOVER Initiative.

Mandel, Hannah; Yoo, Yun; Allen, Andrea; Abedian, Sajjad; Verzani, Zoe; Karlson, Elizabeth; Kleinman, Lawrence; Mudumbi, Praveen; Oliveira, Carlos; Muszynski, Jennifer; Gross, Rachel; Carton, Thomas; Kim, C; Taylor, Emily; Park, Heekyong; Divers, Jasmin; Kelly, J; Arnold, Jonathan; Geary, Carol; Zang, Chengxi; Tantisira, Kelan; Rhee, Kyung; Koropsak, Michael; Mohandas, Sindhu; Vasey, Andrew; Weiner, Mark; Mosa, Abu; Haendel, Melissa; Chute, Christopher; Murphy, Shawn; O'Brien, Lisa; Szmuszkovicz, Jacqueline; Güthe, Nicholas; Santana, Jorge; De, Aliva; Bogie, Amanda; Halabi, Katia; Mohanraj, Lathika; Kinser, Patricia; Packard, Samuel; Tuttle, Katherine; Thorpe, Lorna; Moffitt, Richard.

Res Sq ; 2024 Apr 26.

Article in English | MEDLINE | ID: mdl-38746290

ABSTRACT

Estimates of post-acute sequelae of SARS-CoV-2 infection (PASC) incidence, also known as Long COVID, have varied across studies and changed over time. We estimated PASC incidence among adult and pediatric populations in three nationwide research networks of electronic health records (EHR) participating in the RECOVER Initiative using different classification algorithms (computable phenotypes). Overall, 7% of children and 8.5%-26.4% of adults developed PASC, depending on computable phenotype used. Excess incidence among SARS-CoV-2 patients was 4% in children and ranged from 4-7% among adults, representing a lower-bound incidence estimation based on two control groups - contemporary COVID-19 negative and historical patients (2019). Temporal patterns were consistent across networks, with peaks associated with introduction of new viral variants. Our findings indicate that preventing and mitigating Long COVID remains a public health priority. Examining temporal patterns and risk factors of PASC incidence informs our understanding of etiology and can improve prevention and management.

4.

Emerging opportunities of using large language models for translation between drug molecules and indications.

Oniani, David; Hilsman, Jordan; Zang, Chengxi; Wang, Junmei; Cai, Lianjin; Zawala, Jan; Wang, Yanshan.

Sci Rep ; 14(1): 10738, 2024 05 10.

Article in English | MEDLINE | ID: mdl-38730226

ABSTRACT

A drug molecule is a substance that changes an organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual descriptions, there remains a gap in research regarding their application in facilitating the translation between drug molecules and indications (which describes the disease, condition or symptoms for which the drug is used), or vice versa. Addressing this challenge could greatly benefit the drug discovery process. The capability of generating a drug from a given indication would allow for the discovery of drugs targeting specific diseases or targets and ultimately provide patients with better treatments. In this paper, we first propose a new task, the translation between drug molecules and corresponding indications, and then test existing LLMs on this new task. Specifically, we consider nine variations of the T5 LLM and evaluate them on two public datasets obtained from ChEMBL and DrugBank. Our experiments show the early results of using LLMs for this task and provide a perspective on the state-of-the-art. We also emphasize the current limitations and discuss future work that has the potential to improve the performance on this task. The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases and significantly reduce the cost of drug discovery, with the potential to revolutionize the field of drug discovery in the era of generative AI.

Subject(s)

Artificial Intelligence , Drug Discovery , Humans , Drug Discovery/methods , Pharmaceutical Preparations/chemistry

5.

Corticosteroids for infectious critical illness: A multicenter target trial emulation stratified by predicted organ dysfunction trajectory.

Rajendran, Suraj; Xu, Zhenxing; Pan, Weishen; Zang, Chengxi; Siempos, Ilias; Torres, Lisa; Xu, Jie; Bian, Jiang; Schenck, Edward J; Wang, Fei.

medRxiv ; 2024 Mar 08.

Article in English | MEDLINE | ID: mdl-38496630

ABSTRACT

Corticosteroids decrease the duration of organ dysfunction in a range of infectious critical illnesses, but their risk and benefit are not fully defined using this construct. This retrospective multicenter study aimed to evaluate the association between usage of corticosteroids and mortality of patients with infectious critical illness by emulating a target trial framework. The study employed a novel stratification method with predictive machine learning (ML) subphenotyping based on organ dysfunction trajectory. Our analysis revealed that corticosteroids' effectiveness varied depending on the stratification method. The ML-based approach identified four distinct subphenotypes, two of which had a large enough sample size in our patient cohorts for further evaluation: "Rapidly Improving" (RI) and "Rapidly Worsening," (RW) which showed divergent responses to corticosteroid treatment. Specifically, the RW group either benefited or were not harmed from corticosteroids, whereas the RI group appeared to derive harm. In the development cohort, which comprised of a combination of patients from the eICU and MIMIC-IV datasets, hazard ratio estimates for the primary outcome, 28-day mortality, in the RW group was 1.05 (95% CI: 0.96 - 1.04) whereas for the RW group, it was 1.40 (95% CI: 1.28 - 1.54). For the validation cohort, which comprised of patients from the Critical carE Database for Advanced Research, estimates for 28-day mortality for the RW and RI groups were 1.24 (95% CI: 1.05 - 1.46) and 1.34 (95% CI: 1.14 - 1.59), respectively. For secondary outcomes, the RW group had a shorter time to ICU discharge and time to cessation of mechanical ventilation with corticosteroid treatment, where the RI group again demonstrated harm. The findings support matching treatment strategies to empirically observed pathobiology and offer a more nuanced understanding of corticosteroid utility. Our results have implications for the design and interpretation of both observational studies and randomized controlled trials (RCTs), suggesting the need for stratification methods that account for the differential response to standard of care.

6.

High-throughput target trial emulation for Alzheimer's disease drug repurposing with real-world data.

Zang, Chengxi; Zhang, Hao; Xu, Jie; Zhang, Hansi; Fouladvand, Sajjad; Havaldar, Shreyas; Cheng, Feixiong; Chen, Kun; Chen, Yong; Glicksberg, Benjamin S; Chen, Jin; Bian, Jiang; Wang, Fei.

Nat Commun ; 14(1): 8180, 2023 Dec 11.

Article in English | MEDLINE | ID: mdl-38081829

ABSTRACT

Target trial emulation is the process of mimicking target randomized trials using real-world data, where effective confounding control for unbiased treatment effect estimation remains a main challenge. Although various approaches have been proposed for this challenge, a systematic evaluation is still lacking. Here we emulated trials for thousands of medications from two large-scale real-world data warehouses, covering over 10 years of clinical records for over 170 million patients, aiming to identify new indications of approved drugs for Alzheimer's disease. We assessed different propensity score models under the inverse probability of treatment weighting framework and suggested a model selection strategy for improved baseline covariate balancing. We also found that the deep learning-based propensity score model did not necessarily outperform logistic regression-based methods in covariate balancing. Finally, we highlighted five top-ranked drugs (pantoprazole, gabapentin, atorvastatin, fluticasone, and omeprazole) originally intended for other indications with potential benefits for Alzheimer's patients.

Subject(s)

Alzheimer Disease , Humans , Alzheimer Disease/drug therapy , Drug Repositioning , Propensity Score , Atorvastatin/therapeutic use

7.

Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records.

Xu, Jie; Wang, Fei; Zang, Chengxi; Zhang, Hao; Niotis, Kellyann; Liberman, Ava L; Stonnington, Cynthia M; Ishii, Makoto; Adekkanattu, Prakash; Luo, Yuan; Mao, Chengsheng; Rasmussen, Luke V; Xu, Zhenxing; Brandt, Pascal; Pacheco, Jennifer A; Peng, Yifan; Jiang, Guoqian; Isaacson, Richard; Pathak, Jyotishman.

Sci Rep ; 13(1): 8102, 2023 05 19.

Article in English | MEDLINE | ID: mdl-37208478

ABSTRACT

The objective of this study was to investigate the potential association between the use of four frequently prescribed drug classes, namely antihypertensive drugs, statins, selective serotonin reuptake inhibitors, and proton-pump inhibitors, and the likelihood of disease progression from mild cognitive impairment (MCI) to dementia using electronic health records (EHRs). We conducted a retrospective cohort study using observational EHRs from a cohort of approximately 2 million patients seen at a large, multi-specialty urban academic medical center in New York City, USA between 2008 and 2020 to automatically emulate the randomized controlled trials. For each drug class, two exposure groups were identified based on the prescription orders documented in the EHRs following their MCI diagnosis. During follow-up, we measured drug efficacy based on the incidence of dementia and estimated the average treatment effect (ATE) of various drugs. To ensure the robustness of our findings, we confirmed the ATE estimates via bootstrapping and presented associated 95% confidence intervals (CIs). Our analysis identified 14,269 MCI patients, among whom 2501 (17.5%) progressed to dementia. Using average treatment estimation and bootstrapping confirmation, we observed that drugs including rosuvastatin (ATE = - 0.0140 [- 0.0191, - 0.0088], p value < 0.001), citalopram (ATE = - 0.1128 [- 0.125, - 0.1005], p value < 0.001), escitalopram (ATE = - 0.0560 [- 0.0615, - 0.0506], p value < 0.001), and omeprazole (ATE = - 0.0201 [- 0.0299, - 0.0103], p value < 0.001) have a statistically significant association in slowing the progression from MCI to dementia. The findings from this study support the commonly prescribed drugs in altering the progression from MCI to dementia and warrant further investigation.

Subject(s)

Alzheimer Disease , Cognitive Dysfunction , Humans , Alzheimer Disease/diagnosis , Retrospective Studies , Electronic Health Records , Disease Progression , Cognitive Dysfunction/drug therapy , Cognitive Dysfunction/epidemiology , Cognitive Dysfunction/diagnosis , Randomized Controlled Trials as Topic

8.

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative.

Zang, Chengxi; Zhang, Yongkang; Xu, Jie; Bian, Jiang; Morozyuk, Dmitry; Schenck, Edward J; Khullar, Dhruv; Nordvig, Anna S; Shenkman, Elizabeth A; Rothman, Russell L; Block, Jason P; Lyman, Kristin; Weiner, Mark G; Carton, Thomas W; Wang, Fei; Kaushal, Rainu.

Nat Commun ; 14(1): 1948, 2023 04 07.

Article in English | MEDLINE | ID: mdl-37029117

ABSTRACT

Recent studies have investigated post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) using real-world patient data such as electronic health records (EHR). Prior studies have typically been conducted on patient cohorts with specific patient populations which makes their generalizability unclear. This study aims to characterize PASC using the EHR data warehouses from two large Patient-Centered Clinical Research Networks (PCORnet), INSIGHT and OneFlorida+, which include 11 million patients in New York City (NYC) area and 16.8 million patients in Florida respectively. With a high-throughput screening pipeline based on propensity score and inverse probability of treatment weighting, we identified a broad list of diagnoses and medications which exhibited significantly higher incidence risk for patients 30-180 days after the laboratory-confirmed SARS-CoV-2 infection compared to non-infected patients. We identified more PASC diagnoses in NYC than in Florida regarding our screening criteria, and conditions including dementia, hair loss, pressure ulcers, pulmonary fibrosis, dyspnea, pulmonary embolism, chest pain, abnormal heartbeat, malaise, and fatigue, were replicated across both cohorts. Our analyses highlight potentially heterogeneous risks of PASC in different populations.

Subject(s)

COVID-19 , Post-Acute COVID-19 Syndrome , Humans , COVID-19/epidemiology , Electronic Health Records , SARS-CoV-2 , Propensity Score

9.

Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER Program.

Varma, Jay K; Zang, Chengxi; Carton, Thomas W; Block, Jason P; Khullar, Dhruv J; Zhang, Yongkang; Weiner, Mark G; Rothman, Russell L; Schenck, Edward J; Xu, Zhenxing; Lyman, Kristin; Bian, Jiang; Xu, Jie; Shenkman, Elizabeth A; Maughan, Christine; Castro-Baucom, Leah; O'Brien, Lisa; Wang, Fei; Kaushal, Rainu.

medRxiv ; 2023 Feb 23.

Article in English | MEDLINE | ID: mdl-36865304

ABSTRACT

Importance: The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. Objective: To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. Design: Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. Setting: Healthcare facilities in New York and Florida. Participants: Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. Exposure: Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. Main Outcomes and Measures: Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31-180 days after a positive COVID-19 test compared to persons with only negative tests during the 31-180 days after the last negative test. Results: We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those with a negative test, (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). Conclusions and Relevance: We documented a substantial relative risk of pulmonary embolism and large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection.

10.

Risk Factors and Predictive Modeling for Post-Acute Sequelae of SARS-CoV-2 Infection: Findings from EHR Cohorts of the RECOVER Initiative.

Zang, Chengxi; Hou, Yu; Schenck, Edward; Xu, Zhenxing; Zhang, Yongkang; Xu, Jie; Bian, Jiang; Morozyuk, Dmitry; Khullar, Dhruv; Nordvig, Anna; Shenkman, Elizabeth; Rothman, Russel; Block, Jason; Lyman, Kristin; Zhang, Yiye; Varma, Jay; Weiner, Mark; Carton, Thomas; Wang, Fei; Kaushal, Rainu.

Res Sq ; 2023 Mar 08.

Article in English | MEDLINE | ID: mdl-36945608

ABSTRACT

Background: Patients who were SARS-CoV-2 infected could suffer from newly incidental conditions in their post-acute infection period. These conditions, denoted as the post-acute sequelae of SARS-CoV-2 infection (PASC), are highly heterogeneous and involve a diverse set of organ systems. Limited studies have investigated the predictability of these conditions and their associated risk factors. Method: In this retrospective cohort study, we investigated two large-scale PCORnet clinical research networks, INSIGHT and OneFlorida+, including 11 million patients in the New York City area and 16.8 million patients from Florida, to develop machine learning prediction models for those who are at risk for newly incident PASC and to identify factors associated with newly incident PASC conditions. Adult patients aged 20 with SARS-CoV-2 infection and without recorded infection between March 1st, 2020, and November 30th, 2021, were used for identifying associated factors with incident PASC after removing background associations. The predictive models were developed on infected adults. Results: We find several incident PASC, e.g., malnutrition, COPD, dementia, and acute kidney failure, were associated with severe acute SARS-CoV-2 infection, defined by hospitalization and ICU stay. Older age and extremes of weight were also associated with these incident conditions. These conditions were better predicted (C-index >0.8). Moderately predictable conditions included diabetes and thromboembolic disease (C-index 0.7-0.8). These were associated with a wider variety of baseline conditions. Less predictable conditions included fatigue, anxiety, sleep disorders, and depression (C-index around 0.6). Conclusions: This observational study suggests that a set of likely risk factors for different PASC conditions were identifiable from EHRs, predictability of different PASC conditions was heterogeneous, and using machine learning-based predictive models might help in identifying patients who were at risk of developing incident PASC.

11.

Identifying environmental risk factors for post-acute sequelae of SARS-CoV-2 infection: An EHR-based cohort study from the recover program.

Zhang, Yongkang; Hu, Hui; Fokaidis, Vasilios; V, Colby Lewis; Xu, Jie; Zang, Chengxi; Xu, Zhenxing; Wang, Fei; Koropsak, Michael; Bian, Jiang; Hall, Jaclyn; Rothman, Russell L; Shenkman, Elizabeth A; Wei, Wei-Qi; Weiner, Mark G; Carton, Thomas W; Kaushal, Rainu.

Environ Adv ; 11: 100352, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36785842

ABSTRACT

Post-acute sequelae of SARS-CoV-2 infection (PASC) affects a wide range of organ systems among a large proportion of patients with SARS-CoV-2 infection. Although studies have identified a broad set of patient-level risk factors for PASC, little is known about the association between "exposome"-the totality of environmental exposures and the risk of PASC. Using electronic health data of patients with COVID-19 from two large clinical research networks in New York City and Florida, we identified environmental risk factors for 23 PASC symptoms and conditions from nearly 200 exposome factors. The three domains of exposome include natural environment, built environment, and social environment. We conducted a two-phase environment-wide association study. In Phase 1, we ran a mixed effects logistic regression with 5-digit ZIP Code tabulation area (ZCTA5) random intercepts for each PASC outcome and each exposome factor, adjusting for a comprehensive set of patient-level confounders. In Phase 2, we ran a mixed effects logistic regression for each PASC outcome including all significant (false positive discovery adjusted p-value < 0.05) exposome characteristics identified from Phase I and adjusting for confounders. We identified air toxicants (e.g., methyl methacrylate), particulate matter (PM2.5) compositions (e.g., ammonium), neighborhood deprivation, and built environment (e.g., food access) that were associated with increased risk of PASC conditions related to nervous, blood, circulatory, endocrine, and other organ systems. Specific environmental risk factors for each PASC condition and symptom were different across the New York City area and Florida. Future research is warranted to extend the analyses to other regions and examine more granular exposome characteristics to inform public health efforts to help patients recover from SARS-CoV-2 infection.

12.

Racial/Ethnic Disparities in Post-acute Sequelae of SARS-CoV-2 Infection in New York: an EHR-Based Cohort Study from the RECOVER Program.

Khullar, Dhruv; Zhang, Yongkang; Zang, Chengxi; Xu, Zhenxing; Wang, Fei; Weiner, Mark G; Carton, Thomas W; Rothman, Russell L; Block, Jason P; Kaushal, Rainu.

J Gen Intern Med ; 38(5): 1127-1136, 2023 04.

Article in English | MEDLINE | ID: mdl-36795327

ABSTRACT

BACKGROUND: Compared to white individuals, Black and Hispanic individuals have higher rates of COVID-19 hospitalization and death. Less is known about racial/ethnic differences in post-acute sequelae of SARS-CoV-2 infection (PASC). OBJECTIVE: Examine racial/ethnic differences in potential PASC symptoms and conditions among hospitalized and non-hospitalized COVID-19 patients. DESIGN: Retrospective cohort study using data from electronic health records. PARTICIPANTS: 62,339 patients with COVID-19 and 247,881 patients without COVID-19 in New York City between March 2020 and October 2021. MAIN MEASURES: New symptoms and conditions 31-180 days after COVID-19 diagnosis. KEY RESULTS: The final study population included 29,331 white patients (47.1%), 12,638 Black patients (20.3%), and 20,370 Hispanic patients (32.7%) diagnosed with COVID-19. After adjusting for confounders, significant racial/ethnic differences in incident symptoms and conditions existed among both hospitalized and non-hospitalized patients. For example, 31-180 days after a positive SARS-CoV-2 test, hospitalized Black patients had higher odds of being diagnosed with diabetes (adjusted odds ratio [OR]: 1.96, 95% confidence interval [CI]: 1.50-2.56, q<0.001) and headaches (OR: 1.52, 95% CI: 1.11-2.08, q=0.02), compared to hospitalized white patients. Hospitalized Hispanic patients had higher odds of headaches (OR: 1.62, 95% CI: 1.21-2.17, q=0.003) and dyspnea (OR: 1.22, 95% CI: 1.05-1.42, q=0.02), compared to hospitalized white patients. Among non-hospitalized patients, Black patients had higher odds of being diagnosed with pulmonary embolism (OR: 1.68, 95% CI: 1.20-2.36, q=0.009) and diabetes (OR: 2.13, 95% CI: 1.75-2.58, q<0.001), but lower odds of encephalopathy (OR: 0.58, 95% CI: 0.45-0.75, q<0.001), compared to white patients. Hispanic patients had higher odds of being diagnosed with headaches (OR: 1.41, 95% CI: 1.24-1.60, q<0.001) and chest pain (OR: 1.50, 95% CI: 1.35-1.67, q < 0.001), but lower odds of encephalopathy (OR: 0.64, 95% CI: 0.51-0.80, q<0.001). CONCLUSIONS: Compared to white patients, patients from racial/ethnic minority groups had significantly different odds of developing potential PASC symptoms and conditions. Future research should examine the reasons for these differences.

Subject(s)

Brain Diseases , COVID-19 , Humans , COVID-19/complications , Ethnicity , Cohort Studies , Post-Acute COVID-19 Syndrome , SARS-CoV-2 , Retrospective Studies , COVID-19 Testing , Minority Groups , New York City/epidemiology , Headache/diagnosis , Headache/epidemiology

13.

Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes.

Zhang, Hao; Zang, Chengxi; Xu, Zhenxing; Zhang, Yongkang; Xu, Jie; Bian, Jiang; Morozyuk, Dmitry; Khullar, Dhruv; Zhang, Yiye; Nordvig, Anna S; Schenck, Edward J; Shenkman, Elizabeth A; Rothman, Russell L; Block, Jason P; Lyman, Kristin; Weiner, Mark G; Carton, Thomas W; Wang, Fei; Kaushal, Rainu.

Nat Med ; 29(1): 226-235, 2023 01.

Article in English | MEDLINE | ID: mdl-36456834

ABSTRACT

The post-acute sequelae of SARS-CoV-2 infection (PASC) refers to a broad spectrum of symptoms and signs that are persistent, exacerbated or newly incident in the period after acute SARS-CoV-2 infection. Most studies have examined these conditions individually without providing evidence on co-occurring conditions. In this study, we leveraged the electronic health record data of two large cohorts, INSIGHT and OneFlorida+, from the national Patient-Centered Clinical Research Network. We created a development cohort from INSIGHT and a validation cohort from OneFlorida+ including 20,881 and 13,724 patients, respectively, who were SARS-CoV-2 infected, and we investigated their newly incident diagnoses 30-180 days after a documented SARS-CoV-2 infection. Through machine learning analysis of over 137 symptoms and conditions, we identified four reproducible PASC subphenotypes, dominated by cardiac and renal (including 33.75% and 25.43% of the patients in the development and validation cohorts); respiratory, sleep and anxiety (32.75% and 38.48%); musculoskeletal and nervous system (23.37% and 23.35%); and digestive and respiratory system (10.14% and 12.74%) sequelae. These subphenotypes were associated with distinct patient demographics, underlying conditions before SARS-CoV-2 infection and acute infection phase severity. Our study provides insights into the heterogeneity of PASC and may inform stratified decision-making in the management of PASC conditions.

Subject(s)

COVID-19 , Humans , COVID-19/epidemiology , SARS-CoV-2 , Post-Acute COVID-19 Syndrome , Anxiety , Anxiety Disorders , Disease Progression

14.

Building the Model.

Yang, He S; Rhoads, Daniel D; Sepulveda, Jorge; Zang, Chengxi; Chadburn, Amy; Wang, Fei.

Arch Pathol Lab Med ; 147(7): 826-836, 2023 Jul 01.

Article in English | MEDLINE | ID: mdl-36223208

ABSTRACT

CONTEXT.: Machine learning (ML) allows for the analysis of massive quantities of high-dimensional clinical laboratory data, thereby revealing complex patterns and trends. Thus, ML can potentially improve the efficiency of clinical data interpretation and the practice of laboratory medicine. However, the risks of generating biased or unrepresentative models, which can lead to misleading clinical conclusions or overestimation of the model performance, should be recognized. OBJECTIVES.: To discuss the major components for creating ML models, including data collection, data preprocessing, model development, and model evaluation. We also highlight many of the challenges and pitfalls in developing ML models, which could result in misleading clinical impressions or inaccurate model performance, and provide suggestions and guidance on how to circumvent these challenges. DATA SOURCES.: The references for this review were identified through searches of the PubMed database, US Food and Drug Administration white papers and guidelines, conference abstracts, and online preprints. CONCLUSIONS.: With the growing interest in developing and implementing ML models in clinical practice, laboratorians and clinicians need to be educated in order to collect sufficiently large and high-quality data, properly report the data set characteristics, and combine data from multiple institutions with proper normalization. They will also need to assess the reasons for missing values, determine the inclusion or exclusion of outliers, and evaluate the completeness of a data set. In addition, they require the necessary knowledge to select a suitable ML model for a specific clinical question and accurately evaluate the performance of the ML model, based on objective criteria. Domain-specific knowledge is critical in the entire workflow of developing ML models.

Subject(s)

Computer Simulation , Machine Learning , Humans

15.

Identifying Contextual and Spatial Risk Factors for Post-Acute Sequelae of SARS-CoV-2 Infection: An EHR-based Cohort Study from the RECOVER Program.

Zhang, Yongkang; Hu, Hui; Fokaidis, Vasilios; Lewis, Colby; Xu, Jie; Zang, Chengxi; Xu, Zhenxing; Wang, Fei; Koropsak, Michael; Bian, Jiang; Hall, Jaclyn; Rothman, Russell L; Shenkman, Elizabeth A; Wei, Wei-Qi; Weiner, Mark G; Carton, Thomas W; Kaushal, Rainu.

medRxiv ; 2022 Oct 13.

Article in English | MEDLINE | ID: mdl-36263067

ABSTRACT

Post-acute sequelae of SARS-CoV-2 infection (PASC) affects a wide range of organ systems among a large proportion of patients with SARS-CoV-2 infection. Although studies have identified a broad set of patient-level risk factors for PASC, little is known about the contextual and spatial risk factors for PASC. Using electronic health data of patients with COVID-19 from two large clinical research networks in New York City and Florida, we identified contextual and spatial risk factors from nearly 200 environmental characteristics for 23 PASC symptoms and conditions of eight organ systems. We conducted a two-phase environment-wide association study. In Phase 1, we ran a mixed effects logistic regression with 5-digit ZIP Code tabulation area (ZCTA5) random intercepts for each PASC outcome and each contextual and spatial factor, adjusting for a comprehensive set of patient-level confounders. In Phase 2, we ran a mixed effects logistic regression for each PASC outcome including all significant (false positive discovery adjusted p-value < 0.05) contextual and spatial characteristics identified from Phase I and adjusting for confounders. We identified air toxicants (e.g., methyl methacrylate), criteria air pollutants (e.g., sulfur dioxide), particulate matter (PM 2.5 ) compositions (e.g., ammonium), neighborhood deprivation, and built environment (e.g., food access) that were associated with increased risk of PASC conditions related to nervous, respiratory, blood, circulatory, endocrine, and other organ systems. Specific contextual and spatial risk factors for each PASC condition and symptom were different across New York City area and Florida. Future research is warranted to extend the analyses to other regions and examine more granular contextual and spatial characteristics to inform public health efforts to help patients recover from SARS-CoV-2 infection.

16.

Development of a screening algorithm for borderline personality disorder using electronic health records.

Zang, Chengxi; Goodman, Marianne; Zhu, Zheng; Yang, Lulu; Yin, Ziwei; Tamas, Zsuzsanna; Sharma, Vikas Mohan; Wang, Fei; Shao, Nan.

Sci Rep ; 12(1): 11976, 2022 07 13.

Article in English | MEDLINE | ID: mdl-35831356

ABSTRACT

Borderline personality disorder (BoPD or BPD) is highly prevalent and characterized by reactive moods, impulsivity, behavioral dysregulation, and distorted self-image. Yet the BoPD diagnosis is underutilized and patients with BoPD are frequently misdiagnosed resulting in lost opportunities for appropriate treatment. Automated screening of electronic health records (EHRs) is one potential strategy to help identify possible BoPD patients who are otherwise undiagnosed. We present the development and analytical validation of a BoPD screening algorithm based on routinely collected and structured EHRs. This algorithm integrates rule-based selection and machine learning (ML) in a two-step framework by first selecting potential patients based on the presence of comorbidities and characteristics commonly associated with BoPD, and then predicting whether the patients most likely have BoPD. Leveraging a large-scale US-based de-identified EHR database and our clinical expert's rating of two random samples of patient EHRs, results show that our screening algorithm has a high consistency with our clinical expert's ratings, with area under the receiver operating characteristic (AUROC) 0.837 [95% confidence interval (CI) 0.778-0.892], positive predictive value 0.717 (95% CI 0.583-0.836), accuracy 0.820 (95% CI 0.768-0.873), sensitivity 0.541 (95% CI 0.417-0.667) and specificity 0.922 (95% CI 0.880-0.960). Our aim is, to provide an additional resource to facilitate clinical decision making and promote the development of digital medicine.

Subject(s)

Borderline Personality Disorder , Electronic Health Records , Algorithms , Borderline Personality Disorder/diagnosis , Borderline Personality Disorder/epidemiology , Databases, Factual , Humans , Machine Learning

17.

Machine Learning for Identifying Data-Driven Subphenotypes of Incident Post-Acute SARS-CoV-2 Infection Conditions with Large Scale Electronic Health Records: Findings from the RECOVER Initiative.

Zhang, Hao; Zang, Chengxi; Xu, Zhenxing; Zhang, Yongkang; Xu, Jie; Bian, Jiang; Morozyuk, Dmitry; Khullar, Dhruv; Zhang, Yiye; Nordvig, Anna Starikovsky; Schenck, Edward J; Shenkman, Elizabeth Ann; Rothman, Russel L; Block, Jason P; Lyman, Kristin; Weiner, Mark; Carton, Thomas W; Wang, Fei; Kaushal, Rainu.

medRxiv ; 2022 Jun 08.

Article in English | MEDLINE | ID: mdl-35665007

ABSTRACT

The post-acute sequelae of SARS-CoV-2 infection (PASC) refers to a broad spectrum of symptoms and signs that are persistent, exacerbated, or newly incident in the post-acute SARS-CoV-2 infection period of COVID-19 patients. Most studies have examined these conditions individually without providing concluding evidence on co-occurring conditions. To answer this question, this study leveraged electronic health records (EHRs) from two large clinical research networks from the national Patient-Centered Clinical Research Network (PCORnet) and investigated patients' newly incident diagnoses that appeared within 30 to 180 days after a documented SARS-CoV-2 infection. Through machine learning, we identified four reproducible subphenotypes of PASC dominated by blood and circulatory system, respiratory, musculoskeletal and nervous system, and digestive system problems, respectively. We also demonstrated that these subphenotypes were associated with distinct patterns of patient demographics, underlying conditions present prior to SARS-CoV-2 infection, acute infection phase severity, and use of new medications in the post-acute period. Our study provides novel insights into the heterogeneity of PASC and can inform stratified decision-making in the treatment of COVID-19 patients with PASC conditions.

18.

Contrastive learning improves critical event prediction in COVID-19 patients.

Wanyan, Tingyi; Honarvar, Hossein; Jaladanki, Suraj K; Zang, Chengxi; Naik, Nidhi; Somani, Sulaiman; De Freitas, Jessica K; Paranjpe, Ishan; Vaid, Akhil; Zhang, Jing; Miotto, Riccardo; Wang, Zhangyang; Nadkarni, Girish N; Zitnik, Marinka; Azad, Ariful; Wang, Fei; Ding, Ying; Glicksberg, Benjamin S.

Patterns (N Y) ; 2(12): 100389, 2021 Dec 10.

Article in English | MEDLINE | ID: mdl-34723227

ABSTRACT

Deep learning (DL) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing DL models for the coronavirus disease 2019 (COVID-19) pandemic, where data are highly class imbalanced. Conventional approaches in DL use cross-entropy loss (CEL), which often suffers from poor margin classification. We show that contrastive loss (CL) improves the performance of CEL, especially in imbalanced electronic health records (EHR) data for COVID-19 analyses. We use a diverse EHR dataset to predict three outcomes: mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over multiple time windows. To compare the performance of CEL and CL, models are tested on the full dataset and a restricted dataset. CL models consistently outperform CEL models, with differences ranging from 0.04 to 0.15 for area under the precision and recall curve (AUPRC) and 0.05 to 0.1 for area under the receiver-operating characteristic curve (AUROC).

19.

Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients.

Wanyan, Tingyi; Honarvar, Hossein; Jaladanki, Suraj K; Zang, Chengxi; Naik, Nidhi; Somani, Sulaiman; Freitas, Jessica K De; Paranjpe, Ishan; Vaid, Akhil; Miotto, Riccardo; Nadkarni, Girish N; Zitnik, Marinka; Wang, Fei; Ding, Ying; Glicksberg, Benjamin S.

ArXiv ; 2021 Jan 11.

Article in English | MEDLINE | ID: mdl-33442560

ABSTRACT

Machine Learning (ML) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing ML models for the coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced, particularly within electronic health records (EHR) research. Conventional approaches in ML use cross-entropy loss (CEL) that often suffers from poor margin classification. For the first time, we show that contrastive loss (CL) improves the performance of CEL especially for imbalanced EHR data and the related COVID-19 analyses. This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. We use EHR data from five hospitals within the Mount Sinai Health System (MSHS) to predict mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over 24 and 48 hour time windows. We train two sequential architectures (RNN and RETAIN) using two loss functions (CEL and CL). Models are tested on full sample data set which contain all available data and restricted data set to emulate higher class imbalance.CL models consistently outperform CEL models with the restricted data set on these tasks with differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC. For the restricted sample, only the CL model maintains proper clustering and is able to identify important features, such as pulse oximetry. CL outperforms CEL in instances of severe class imbalance, on three EHR outcomes with respect to three performance metrics: predictive power, clustering, and feature importance. We believe that the developed CL framework can be expanded and used for EHR ML work in general.

20.

SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.

Zang, Chengxi; Wang, Fei.

Proc IEEE Int Conf Data Min ; 2021: 857-866, 2021 Dec.

Article in English | MEDLINE | ID: mdl-36438203

ABSTRACT

Contrastive learning has demonstrated promising performance in image and text domains either in a self-supervised or a supervised manner. In this work, we extend the supervised contrastive learning framework to clinical risk prediction problems based on longitudinal electronic health records (EHR). We propose a general supervised contrastive loss â C o n t r a s t i v e C r o s s E n t r o p y + λ â S u p e r v i s e d C o n t r a s t i v e R e g u l a r i z e r for learning both binary classification (e.g. in-hospital mortality prediction) and multi-label classification (e.g. phenotyping) in a unified framework. Our supervised contrastive loss practices the key idea of contrastive learning, namely, pulling similar samples closer and pushing dissimilar ones apart from each other, simultaneously by its two components: â C o n t r a s t i v e C r o s s E n t r o p y tries to contrast samples with learned anchors which represent positive and negative clusters, and â S u p e r v i s e d C o n t r a s t i v e R e g u l a r i z e r tries to contrast samples with each other according to their supervised labels. We propose two versions of the above supervised contrastive loss and our experiments on real-world EHR data demonstrate that our proposed loss functions show benefits in improving the performance of strong baselines and even state-of-the-art models on benchmarking tasks for clinical risk predictions. Our loss functions work well with extremely imbalanced data which are common for clinical risk prediction problems. Our loss functions can be easily used to replace (binary or multi-label) cross-entropy loss adopted in existing clinical predictive models. The Pytorch code is released at https://github.com/calvin-zcx/SCEHR.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL