Search | VHL Regional Portal

1.

How data science and AI-based technologies impact genomics

Jing LIN; Kee-Yuan NGIAM.

Singapore medical journal ; : 59-66, 2023.

Article in English | WPRIM (Western Pacific) | ID: wpr-969666

ABSTRACT

Advancements in high-throughput sequencing have yielded vast amounts of genomic data, which are studied using genome-wide association study (GWAS)/phenome-wide association study (PheWAS) methods to identify associations between the genotype and phenotype. The associated findings have contributed to pharmacogenomics and improved clinical decision support at the point of care in many healthcare systems. However, the accumulation of genomic data from sequencing and clinical data from electronic health records (EHRs) poses significant challenges for data scientists. Following the rise of artificial intelligence (AI) technology such as machine learning and deep learning, an increasing number of GWAS/PheWAS studies have successfully leveraged this technology to overcome the aforementioned challenges. In this review, we focus on the application of data science and AI technology in three areas, including risk prediction and identification of causal single-nucleotide polymorphisms, EHR-based phenotyping and CRISPR guide RNA design. Additionally, we highlight a few emerging AI technologies, such as transfer learning and multi-view learning, which will or have started to benefit genomic studies.

Subject(s)

Artificial Intelligence , Data Science , Genome-Wide Association Study , Genomics , Technology

2.

SurvMaximin: Robust Federated Approach to Transporting Survival Risk Prediction Models

Xuan Wang; Harrison G Zhang; Xin Xiong; Chuan Hong; Griffin M Weber; Gabriel A Brat; Clara-Lea Bonzel; Yuan Luo; Rui Duan; Nathan P Palmer; Meghan R Hutch; Alba Gutiérrez-Sacristán; Riccardo Bellazzi; Luca Chiovato; Kelly Cho; Arianna Dagliati; Hossein Estiri; Noelia García-Barrio; Romain Griffier; David A Hanauer; Yuk-Lam Ho; John H Holmes; Mark S Keller; Jeffrey G Klann; Sehi L'Yi; Sara Lozano-Zahonero; Sarah E Maidlow; Adeline Makoudjou; Alberto Malovini; Bertrand Moal; Jason H Moore; Michele Morris; Danielle L Mowery; Shawn N Murphy; Antoine Neuraz; Kee Yuan Ngiam; Gilbert S Omenn; Lav P Patel; Miguel Pedrera-Jiménez; Andrea Prunotto; Malarkodi Jebathilagam Samayamuthu; Fernando J Sanz Vidorreta; Emily R Schriver; Petra Schubert; Pablo Serrano-Balazote; Andrew M South; Amelia LM Tan; Byorn W.L. Tan; Valentina Tibollo; Patric Tippmann; Shyam Visweswaran; Zongqi Xia; William Yuan; Daniela Zöller; Isaac S Kohane; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Paul Avillach; Zijian Guo; Tianxi Cai.

Preprint in English | medRxiv | ID: ppmedrxiv-22270410

ABSTRACT

ObjectiveFor multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. Materials and MethodsFor each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or can be a single center, corresponding to transfer learning. ResultsSimulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. ConclusionsThe SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.

3.

Multinational Prevalence of Neurological Phenotypes in Patients Hospitalized with COVID-19

Trang T Le; Alba Gutiérrez-Sacristán; Jiyeon Son; Chuan Hong; Andrew M South; Brett K Beaulieu-Jones; Ne Hooi Will Loh; Yuan Luo; Michele Morris; Kee Yuan Ngiam; Lav P Patel; Malarkodi J Samayamuthu; Emily Schriver; Amelia LM Tan; Jason Moore; Tianxi Cai; Gilbert S. Omenn; Paul Avillach; Isaac S Kohane; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Shyam Visweswaran; Danielle L Mowery; Zongqi Xia.

Preprint in English | medRxiv | ID: ppmedrxiv-21249817

ABSTRACT

OBJECTIVENeurological complications can worsen outcomes in COVID-19. We defined the prevalence of a wide range of neurological conditions among patients hospitalized with COVID-19 in geographically diverse multinational populations. METHODSUsing electronic health record (EHR) data from 348 participating hospitals across 6 countries and 3 continents between January and September 2020, we performed a cross-sectional study of hospitalized adult and pediatric patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test, both with and without severe COVID-19. We assessed the frequency of each disease category and 3-character International Classification of Disease (ICD) code of neurological diseases by countries, sites, time before and after admission for COVID-19, and COVID-19 severity. RESULTSAmong the 35,177 hospitalized patients with SARS-CoV-2 infection, there was increased prevalence of disorders of consciousness (5.8%, 95% confidence interval [CI]: 3.7%-7.8%, pFDR<.001) and unspecified disorders of the brain (8.1%, 95%CI: 5.7%-10.5%, pFDR<.001), compared to pre-admission prevalence. During hospitalization, patients who experienced severe COVID-19 status had 22% (95%CI: 19%-25%) increase in the relative risk (RR) of disorders of consciousness, 24% (95%CI: 13%-35%) increase in other cerebrovascular diseases, 34% (95%CI: 20%-50%) increase in nontraumatic intracranial hemorrhage, 37% (95%CI: 17%-60%) increase in encephalitis and/or myelitis, and 72% (95%CI: 67%-77%) increase in myopathy compared to those who never experienced severe disease. INTERPRETATIONUsing an international network and common EHR data elements, we highlight an increase in the prevalence of central and peripheral neurological phenotypes in patients hospitalized with SARS-CoV-2 infection, particularly among those with severe disease.

4.

International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study

Griffin M Weber; Chuan Hong; Nathan P Palmer; Paul Avillach; Shawn N Murphy; Alba Gutiérrez-Sacristán; Zongqi Xia; Arnaud Serret-Larmande; Antoine Neuraz; Gilbert S. Omenn; Shyam Visweswaran; Jeffrey G Klann; Andrew M South; Ne Hooi Will Loh; Mario Cannataro; Brett K Beaulieu-Jones; Riccardo Bellazzi; Giuseppe Agapito; Mario Alessiani; Bruce J Aronow; Douglas S Bell; Antonio Bellasi; Vincent Benoit; Michele Beraghi; Martin Boeker; John Booth; Silvano Bosari; Florence T Bourgeois; Nicholas W Brown; Mauro Bucalo; Luca Chiovato; Lorenzo Chiudinelli; Arianna Dagliati; Batsal Devkota; Scott L DuVall; Robert W Follett; Thomas Ganslandt; Noelia García Barrio; Tobias Gradinger; Romain Griffier; David A Hanauer; John H Holmes; Petar Horki; Kenneth M Huling; Richard W Issitt; Vianney Jouhet; Mark S Keller; Detlef Kraska; Molei Liu; Yuan Luo; Kristine E Lynch; Alberto Malovini; Kenneth D Mandl; Chengsheng Mao; Anupama Maram; Michael E Matheny; Thomas Maulhardt; Maria Mazzitelli; Marianna Milano; Jason H Moore; Jeffrey S Morris; Michele Morris; Danielle L Mowery; Thomas P Naughton; Kee Yuan Ngiam; James B Norman; Lav P Patel; Miguel Pedrera Jimenez; Rachel B Ramoni; Emily R Schriver; Luigia Scudeller; Neil J Sebire; Pablo Serrano Balazote; Anastasia Spiridou; Amelia LM Tan; Byorn W.L. Tan; Valentina Tibollo; Carlo Torti; Enrico M Trecarichi; Michele Vitacca; Alberto Zambelli; Chiara Zucco; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Isaac S Kohane; Tianxi Cai; Gabriel A Brat.

Preprint in English | medRxiv | ID: ppmedrxiv-20247684

ABSTRACT

ObjectivesTo perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. DesignRetrospective cohort study. SettingThe Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. ParticipantsPatients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measuresPatients were categorized as "ever-severe" or "never-severe" using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. ResultsOf 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. ConclusionsLaboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.

5.

Validation of a Derived International Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Jeffrey G. Klann; Griffin M Weber; Hossein Estiri; Bertrand Moal; Paul Avillach; Chuan Hong; Victor M Castro; Thomas Maulhardt; Amelia LM Tan; Alon Geva; Brett K Beaulieu-Jones; Alberto Malovini; Andrew M South; Shyam Visweswaran; Gilbert S Omenn; Kee Yuan Ngiam; Kenneth D Mandl; Martin Boeker; Karen L Olson; Danielle L Mowery; Michele Morris; Robert W Follett; David A Hanauer; Riccardo Bellazzi; Jason H Moore; Ne Hooi Will Loh; Douglas S Bell; Kavishwar Wagholikar; Luca Chiovato; Valentina Tibollo; Siegbert Rieg; Anthony LLJ Li; Vianney Jouhet; Emilly Schriver; Malarkodi J Samayamuthu; Zongqi Xia; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Isaac S Kohane; Gabriel A Brat; Shawn N Murphy.

Preprint in English | medRxiv | ID: ppmedrxiv-20201855

ABSTRACT

AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSIntroductionC_ST_ABSThe Consortium for Clinical Characterization of COVID-19 by EHR (4CE) includes hundreds of hospitals internationally using a federated computational approach to COVID-19 research using the EHR. ObjectiveWe sought to develop and validate a standard definition of COVID-19 severity from readily accessible EHR data across the Consortium. MethodsWe developed an EHR-based severity algorithm and validated it on patient hospitalization data from 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also used a machine learning approach to compare selected predictors of severity to the 4CE algorithm at one site. ResultsThe 4CE severity algorithm performed with pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of single code categories for acuity were unacceptably inaccurate - varying by up to 0.65 across sites. A multivariate machine learning approach identified codes resulting in mean AUC 0.956 (95% CI: 0.952, 0.959) compared to 0.903 (95% CI: 0.886, 0.921) using expert-derived codes. Billing codes were poor proxies of ICU admission, with 49% precision and recall compared against chart review at one partner institution. DiscussionWe developed a proxy measure of severity that proved resilient to coding variability internationally by using a set of 6 code classes. In contrast, machine-learning approaches may tend to overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold standard outcomes, possibly due to pandemic conditions. ConclusionWe developed an EHR-based algorithm for COVID-19 severity and validated it at 12 international sites.

6.

International Electronic Health Record-Derived COVID-19 Clinical Course Profile: The 4CE Consortium

Gabriel A Brat; Griffin M Weber; Nils Gehlenborg; Paul Avillach; Nathan P Palmer; Luca Chiovato; James Cimino; Lemuel R Waitman; Gilbert S Omenn; Alberto Malovini; Jason H Moore; Brett K Beaulieu-Jones; Valentina Tibollo; Shawn N Murphy; Sehi L'Yi; Mark S Keller; Riccardo Bellazzi; David A Hanauer; Arnaud Serret-Larmande; Alba Gutierrez-Sacristan; John H Holmes; Douglas S Bell; Kenneth D Mandl; Robert W Follett; Jeffrey G Klann; Douglas A Murad; Luigia Scudeller; Mauro Bucalo; Katie Kirchoff; Jean Craig; Jihad Obeid; Vianney Jouhet; Romain Griffier; Sebastien Cossin; Bertrand Moal; Lav P Patel; Antonio Bellasi; Hans U Prokosch; Detlef Kraska; Piotr Sliz; Amelia LM Tan; Kee Yuan Ngiam; Alberto Zambelli; Danielle L Mowery; Emily Schiver; Batsal Devkota; Robert L Bradford; Mohamad Daniar; - APHP/Universities/INSERM COVID-19 research collaboration; Christel Daniel; Vincent Benoit; Romain Bey; Nicolas Paris; Anne Sophie Jannot; Patricia Serre; Nina Orlova; Julien Dubiel; Martin Hilka; Anne Sophie Jannot; Stephane Breant; Judith Leblanc; Nicolas Griffon; Anita Burgun; Melodie Bernaux; Arnaud Sandrin; Elisa Salamanca; Thomas Ganslandt; Tobias Gradinger; Julien Champ; Martin Boeker; Patricia Martel; Alexandre Gramfort; Olivier Grisel; Damien Leprovost; Thomas Moreau; Gael Varoquaux; Jill-Jenn Vie; Demian Wassermann; Arthur Mensch; Charlotte Caucheteux; Christian Haverkamp; Guillaume Lemaitre; Ian D Krantz; Sylvie Cormont; Andrew South; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Tianxi Cai; Isaac S Kohane.

Preprint in English | medRxiv | ID: ppmedrxiv-20059691

ABSTRACT

We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across 5 countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on comorbidities and temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL