Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 169
Filter
1.
Front Cell Neurosci ; 18: 1369242, 2024.
Article in English | MEDLINE | ID: mdl-38846640

ABSTRACT

Recently, large-scale scRNA-seq datasets have been generated to understand the complex signaling mechanisms within the microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. However, the background signaling networks are highly complex and interactive. It remains challenging to infer the core intra- and inter-multi-cell signaling communication networks using scRNA-seq data. In this study, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy. This model divides complex signaling networks into signaling paths, which are then scored and ranked using a novel graph transformer architecture to infer intra- and inter-cell signaling communications. We evaluated the performance of PathFinder using two scRNA-seq data cohorts. The first cohort is an APOE4 genotype-specific AD, and the second is a human cirrhosis cohort. The evaluation confirms the promising potential of using PathFinder as a general signaling network inference model.

2.
bioRxiv ; 2024 May 18.
Article in English | MEDLINE | ID: mdl-38798349

ABSTRACT

Multi-omic data, i.e., genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data. Graph neural network (GNN) AI models have been widely used to analyze graph-structure datasets and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data by node and edge ranking analysis for signaling flow/cascade inference. However, it is non-trivial for graph-AI model developers to pre-analyze multi-omics data and convert them into graph-structure data for individual samples, which can be directly fed into graph-AI models. To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), a novel computational tool that generates multi-omics signaling graphs of individual samples by mapping the multi-omics data onto a biologically meaningful multi-level background signaling network. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. We evaluated the mosGraphGen using both multi-omics datasets of cancer and Alzheimer's disease (AD) samples. The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/Multi-OmicGraphBuilder/mosGraphGen.

3.
J Neurodev Disord ; 16(1): 17, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38632549

ABSTRACT

Monogenic disorders account for a large proportion of population-attributable risk for neurodevelopmental disabilities. However, the data necessary to infer a causal relationship between a given genetic variant and a particular neurodevelopmental disorder is often lacking. Recognizing this scientific roadblock, 13 Intellectual and Developmental Disabilities Research Centers (IDDRCs) formed a consortium to create the Brain Gene Registry (BGR), a repository pairing clinical genetic data with phenotypic data from participants with variants in putative brain genes. Phenotypic profiles are assembled from the electronic health record (EHR) and a battery of remotely administered standardized assessments collectively referred to as the Rapid Neurobehavioral Assessment Protocol (RNAP), which include cognitive, neurologic, and neuropsychiatric assessments, as well as assessments for attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD). Co-enrollment of BGR participants in the Clinical Genome Resource's (ClinGen's) GenomeConnect enables display of variant information in ClinVar. The BGR currently contains data on 479 participants who are 55% male, 6% Asian, 6% Black or African American, 76% white, and 12% Hispanic/Latine. Over 200 genes are represented in the BGR, with 12 or more participants harboring variants in each of these genes: CACNA1A, DNMT3A, SLC6A1, SETD5, and MYT1L. More than 30% of variants are de novo and 43% are classified as variants of uncertain significance (VUSs). Mean standard scores on cognitive or developmental screens are below average for the BGR cohort. EHR data reveal developmental delay as the earliest and most common diagnosis in this sample, followed by speech and language disorders, ASD, and ADHD. BGR data has already been used to accelerate gene-disease validity curation of 36 genes evaluated by ClinGen's BGR Intellectual Disability (ID)-Autism (ASD) Gene Curation Expert Panel. In summary, the BGR is a resource for use by stakeholders interested in advancing translational research for brain genes and continues to recruit participants with clinically reported variants to establish a rich and well-characterized national resource to promote research on neurodevelopmental disorders.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Intellectual Disability , Neurodevelopmental Disorders , Humans , Male , Female , Autism Spectrum Disorder/genetics , Brain , Registries , Methyltransferases
4.
J Am Med Inform Assoc ; 31(5): 1144-1150, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38447593

ABSTRACT

OBJECTIVE: To evaluate the real-world performance of the SMART/HL7 Bulk Fast Health Interoperability Resources (FHIR) Access Application Programming Interface (API), developed to enable push button access to electronic health record data on large populations, and required under the 21st Century Cures Act Rule. MATERIALS AND METHODS: We used an open-source Bulk FHIR Testing Suite at 5 healthcare sites from April to September 2023, including 4 hospitals using electronic health records (EHRs) certified for interoperability, and 1 Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across 6 types of FHIR. RESULTS: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1555-2500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12 000 resources/min. DISCUSSION: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. CONCLUSION: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.


Subject(s)
Health Information Exchange , Health Level Seven , Software , Electronic Health Records , Delivery of Health Care
5.
medRxiv ; 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38370642

ABSTRACT

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

6.
PLoS Comput Biol ; 20(1): e1011785, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38181047

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is a powerful technology to investigate the transcriptional programs in stromal, immune, and disease cells, like tumor cells or neurons within the Alzheimer's Disease (AD) brain or tumor microenvironment (ME) or niche. Cell-cell communications within ME play important roles in disease progression and immunotherapy response and are novel and critical therapeutic targets. Though many tools of scRNA-seq analysis have been developed to investigate the heterogeneity and sub-populations of cells, few were designed for uncovering cell-cell communications of ME and predicting the potentially effective drugs to inhibit the communications. Moreover, the data analysis processes of discovering signaling communication networks and effective drugs using scRNA-seq data are complex and involve a set of critical analysis processes and external supportive data resources, which are difficult for researchers who have no strong computational background and training in scRNA-seq data analysis. To address these challenges, in this study, we developed a novel open-source computational tool, sc2MeNetDrug (https://fuhaililab.github.io/sc2MeNetDrug/). It was specifically designed using scRNA-seq data to identify cell types within disease MEs, uncover the dysfunctional signaling pathways within individual cell types and interactions among different cell types, and predict effective drugs that can potentially disrupt cell-cell signaling communications. sc2MeNetDrug provided a user-friendly graphical user interface to encapsulate the data analysis modules, which can facilitate the scRNA-seq data-based discovery of novel inter-cell signaling communications and novel therapeutic regimens.


Subject(s)
Single-Cell Analysis , Software , RNA-Seq , Sequence Analysis, RNA , Gene Expression Profiling , Signal Transduction/genetics
7.
bioRxiv ; 2024 Jan 15.
Article in English | MEDLINE | ID: mdl-38293243

ABSTRACT

Recently, large-scale scRNA-seq datasets have been generated to understand the complex and poorly understood signaling mechanisms within microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. Though a set of targets have been identified, however, it remains a challenging to infer the core intra- and inter-multi-cell signaling communication networks using the scRNA-seq data, considering the complex and highly interactive background signaling network. Herein, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and signaling communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy, which divides the complex signaling networks into signaling paths, and then score and rank them using a novel graph transformer architecture to infer the intra- and inter-cell signaling communications. We evaluated PathFinder using scRNA-seq data of APOE4-genotype specific AD mice models and identified novel APOE4 altered intra- and inter-cell interaction networks among neurons, astrocytes, and microglia. PathFinder is a general signaling network inference model and can be applied to other omics data-driven signaling network inference.

8.
Am J Transplant ; 24(3): 458-467, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37468109

ABSTRACT

Primary graft dysfunction (PGD) is the leading cause of morbidity and mortality in the first 30 days after lung transplantation. Risk factors for the development of PGD include donor and recipient characteristics, but how multiple variables interact to impact the development of PGD and how clinicians should consider these in making decisions about donor acceptance remain unclear. This was a single-center retrospective cohort study to develop and evaluate machine learning pipelines to predict the development of PGD grade 3 within the first 72 hours of transplantation using donor and recipient variables that are known at the time of donor offer acceptance. Among 576 bilateral lung recipients, 173 (30%) developed PGD grade 3. The cohort underwent a 75% to 25% train-test split, and lasso regression was used to identify 11 variables for model development. A K-nearest neighbor's model showing the best calibration and performance with relatively small confidence intervals was selected as the final predictive model with an area under the receiver operating characteristics curve of 0.65. Machine learning models can predict the risk for development of PGD grade 3 based on data available at the time of donor offer acceptance. This may improve donor-recipient matching and donor utilization in the future.


Subject(s)
Lung Transplantation , Primary Graft Dysfunction , Humans , Retrospective Studies , Primary Graft Dysfunction/diagnosis , Primary Graft Dysfunction/etiology , Lung Transplantation/adverse effects , Risk Factors , Lung
9.
bioRxiv ; 2024 Apr 06.
Article in English | MEDLINE | ID: mdl-37808763

ABSTRACT

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

10.
bioRxiv ; 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-37662280

ABSTRACT

Background and Objectives: Previous approaches pursuing normative modelling for analyzing heterogeneity in Alzheimer's Disease (AD) have relied on a single neuroimaging modality. However, AD is a multi-faceted disorder, with each modality providing unique and complementary info about AD. In this study, we used a deep-learning based multimodal normative model to assess the heterogeneity in regional brain patterns for ATN (amyloid-tau-neurodegeneration) biomarkers. Methods: We selected discovery (n = 665) and replication (n = 430) cohorts with simultaneous availability of ATN biomarkers: Florbetapir amyloid, Flortaucipir tau and T1-weighted MRI (magnetic resonance imaging) imaging. A multimodal variational autoencoder (conditioned on age and sex) was used as a normative model to learn the multimodal regional brain patterns of a cognitively unimpaired (CU) control group. The trained model was applied on individuals on the ADS (AD Spectrum) to estimate their deviations (Z-scores) from the normative distribution, resulting in a Z-score regional deviation map per ADS individual per modality. Regions with Z-scores < -1.96 for MRI and Z-scores > 1.96 for amyloid and tau were labelled as outliers. Hamming distance was used to quantify the dissimilarity between individual based on their outlier deviations across each modality. We also calculated a disease severity index (DSI) for each ADS individual which was estimated by averaging the deviations across all outlier regions corresponding to each modality. Results: ADS individuals with moderate or severe dementia showed higher proportion of regional outliers for each modality as well as more dissimilarity in modality-specific regional outlier patterns compared to ADS individuals with early or mild dementia. DSI was associated with the progressive stages of dementia, (ii) showed significant associations with neuropsychological composite scores and (iii) related to the longitudinal risk of CDR progression. Findings were reproducible in both discovery and replication cohorts. Discussion: Our is the first study to examine the heterogeneity in AD through the lens of multiple neuroimaging modalities (ATN), based on distinct or overlapping patterns of regional outlier deviations. Regional MRI and tau outliers were more heterogenous than regional amyloid outliers. DSI has the potential to be an individual patient metric of neurodegeneration that can help in clinical decision making and monitoring patient response for anti-amyloid treatments.

11.
Genet Med ; 26(3): 101035, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38059438

ABSTRACT

PURPOSE: Clinically ascertained variants are under-utilized in neurodevelopmental disorder research. We established the Brain Gene Registry (BGR) to coregister clinically identified variants in putative brain genes with participant phenotypes. Here, we report 179 genetic variants in the first 179 BGR registrants and analyze the proportion that were novel to ClinVar at the time of entry and those that were absent in other disease databases. METHODS: From 10 academically affiliated institutions, 179 individuals with 179 variants were enrolled into the BGR. Variants were cross-referenced for previous presence in ClinVar and for presence in 6 other genetic databases. RESULTS: Of 179 variants in 76 genes, 76 (42.5%) were novel to ClinVar, and 62 (34.6%) were absent from all databases analyzed. Of the 103 variants present in ClinVar, 37 (35.9%) were uncertain (ClinVar aggregate classification of variant of uncertain significance or conflicting classifications). For 5 variants, the aggregate ClinVar classification was inconsistent with the interpretation from the BGR site-provided classification. CONCLUSION: A significant proportion of clinical variants that are novel or uncertain are not shared, limiting the evidence base for new gene-disease relationships. Registration of paired clinical genetic test results with phenotype has the potential to advance knowledge of the relationships between genes and neurodevelopmental disorders.


Subject(s)
Databases, Genetic , Genetic Variation , Humans , Genetic Variation/genetics , Genetic Testing/methods , Phenotype , Brain
12.
Article in English | MEDLINE | ID: mdl-38130873

ABSTRACT

Normative modelling is a method for understanding the underlying heterogeneity within brain disorders like Alzheimer Disease (AD), by quantifying how each patient deviates from the expected normative pattern that has been learned from a healthy control distribution. Existing deep learning based normative models have been applied on only single modality Magnetic Resonance Imaging (MRI) neuroimaging data. However, these do not take into account the complementary information offered by multimodal M RI, which is essential for understanding a multifactorial disease like AD. To address this limitation, we propose a multi-modal variational autoencoder (mmVAE) based normative modelling framework that can capture the joint distribution between different modalities to identify abnormal brain volume deviations due to AD. Our multi-modal framework takes as input Freesurfer processed brain region volumes from T1-weighted (cortical and subcortical) and T2-weighed (hippocampal) scans of cognitively normal participants to learn the morphological characteristics of the healthy brain. The estimated normative model is then applied on AD patients to quantify the deviation in brain volumes and identify abnormal brain pattern deviations due to the progressive stages of AD. We compared our proposed mmVAE with a baseline unimodal VAE having a single encoder and decoder and the two modalities concatenated as unimodal input. Our experimental results show that deviation maps generated by mmVAE are more sensitive to disease staging within AD, have a better correlation with patient cognition and result in higher number of brain regions with statistically significant deviations compared to the unimodal baseline model.

13.
Res Sq ; 2023 Nov 16.
Article in English | MEDLINE | ID: mdl-38014034

ABSTRACT

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction/diagnosis accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability (~30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.

14.
medRxiv ; 2023 Oct 06.
Article in English | MEDLINE | ID: mdl-37873390

ABSTRACT

Objective: To evaluate the real-world performance in delivering patient data on populations, of the SMART/HL7 Bulk FHIR Access API, required in Electronic Health Records (EHRs) under the 21st Century Cures Act Rule. Materials and Methods: We used an open-source Bulk FHIR Testing Suite at five healthcare sites from April to September 2023, including four hospitals using EHRs certified for interoperability, and one Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across six types of FHIR resources. Results: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8,000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1,555-2,500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12,000 resources/min. Discussion: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. Conclusion: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.

15.
Cancers (Basel) ; 15(17)2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37686486

ABSTRACT

Synergistic drug combinations provide huge potentials to enhance therapeutic efficacy and to reduce adverse reactions. However, effective and synergistic drug combination prediction remains an open question because of the unknown causal disease signaling pathways. Though various deep learning (AI) models have been proposed to quantitatively predict the synergism of drug combinations, the major limitation of existing deep learning methods is that they are inherently not interpretable, which makes the conclusions of AI models untransparent to human experts, henceforth limiting the robustness of the model conclusion and the implementation ability of these models in real-world human-AI healthcare. In this paper, we develop an interpretable graph neural network (GNN) that reveals the underlying essential therapeutic targets and the mechanism of the synergy (MoS) by mining the sub-molecular network of great importance. The key point of the interpretable GNN prediction model is a novel graph pooling layer, a self-attention-based node and edge pool (henceforth SANEpool), that can compute the attention score (importance) of genes and connections based on the genomic features and topology. As such, the proposed GNN model provides a systematic way to predict and interpret the drug combination synergism based on the detected crucial sub-molecular network. Experiments on various well-adopted drug-synergy-prediction datasets demonstrate that (1) the SANEpool model has superior predictive ability to generate accurate synergy score prediction, and (2) the sub-molecular networks detected by the SANEpool are self-explainable and salient for identifying synergistic drug combinations.

16.
Neurology ; 101(14): e1424-e1433, 2023 10 03.
Article in English | MEDLINE | ID: mdl-37532510

ABSTRACT

BACKGROUND AND OBJECTIVES: The capacity of specialty memory clinics in the United States is very limited. If lower socioeconomic status or minoritized racial group is associated with reduced use of memory clinics, this could exacerbate health care disparities, especially if more effective treatments of Alzheimer disease become available. We aimed to understand how use of a memory clinic is associated with neighborhood-level measures of socioeconomic factors and the intersectionality of race. METHODS: We conducted an observational cross-sectional study using electronic health record data to compare the neighborhood advantage of patients seen at the Washington University Memory Diagnostic Center with the catchment area using a geographical information system. Furthermore, we compared the severity of dementia at the initial visit between patients who self-identified as Black or White. We used a multinomial logistic regression model to assess the Clinical Dementia Rating at the initial visit and t tests to compare neighborhood characteristics, including Area Deprivation Index, with those of the catchment area. RESULTS: A total of 4,824 patients seen at the memory clinic between 2008 and 2018 were included in this study (mean age 72.7 [SD 11.0] years, 2,712 [56%] female, 543 [11%] Black). Most of the memory clinic patients lived in more advantaged neighborhoods within the overall catchment area. The percentage of patients self-identifying as Black (11%) was lower than the average percentage of Black individuals by census tract in the catchment area (16%) (p < 0.001). Black patients lived in less advantaged neighborhoods, and Black patients were more likely than White patients to have moderate or severe dementia at their initial visit (odds ratio 1.59, 95% CI 1.11-2.25). DISCUSSION: This study demonstrates that patients living in less affluent neighborhoods were less likely to be seen in one large memory clinic. Black patients were under-represented in the clinic, and Black patients had more severe dementia at their initial visit. These findings suggest that patients with a lower socioeconomic status and who identify as Black are less likely to be seen in memory clinics, which are likely to be a major point of access for any new Alzheimer disease treatments that may become available.


Subject(s)
Alzheimer Disease , Aged , Female , Humans , Male , Alzheimer Disease/complications , Alzheimer Disease/diagnosis , Alzheimer Disease/epidemiology , Alzheimer Disease/ethnology , Alzheimer Disease/therapy , Black People , Cross-Sectional Studies , Racial Groups , Socioeconomic Factors , United States , Memory Disorders/epidemiology , Memory Disorders/ethnology , Memory Disorders/etiology , White People , Neighborhood Characteristics , Middle Aged , Aged, 80 and over
17.
J Am Med Inform Assoc ; 30(10): 1730-1740, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37390812

ABSTRACT

OBJECTIVE: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies. MATERIALS AND METHODS: We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process. RESULTS: We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology. DISCUSSION: There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality. CONCLUSION: Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.


Subject(s)
Data Accuracy , Electronic Health Records
18.
Spine (Phila Pa 1976) ; 48(16): 1138-1147, 2023 Aug 15.
Article in English | MEDLINE | ID: mdl-37249385

ABSTRACT

STUDY DESIGN: Retrospective cohort. OBJECTIVE: The aim of this study was to design a risk-stratified benchmarking tool for adolescent idiopathic scoliosis (AIS) surgeries. SUMMARY OF BACKGROUND DATA: Machine learning (ML) is an emerging method for prediction modeling in orthopedic surgery. Benchmarking is an established method of process improvement and is an area of opportunity for ML methods. Current surgical benchmark tools often use ranks and no "gold standards" for comparisons exist. MATERIALS AND METHODS: Data from 6076 AIS surgeries were collected from a multicenter registry and divided into three datasets: encompassing surgeries performed (1) during the entire registry, (2) the past 10 years, and (3) during the last 5 years of the registry. We trained three ML regression models (baseline linear regression, gradient boosting, and eXtreme gradient boosted) on each data subset to predict each of the five outcome variables, length of stay (LOS), estimated blood loss (EBL), operative time, Scoliosis Research Society (SRS)-Pain and SRS-Self-Image. Performance was categorized as "below expected" if performing worse than one standard deviation of the mean, "as expected" if within 1 SD, and "better than expected" if better than 1 SD of the mean. RESULTS: Ensemble ML methods classified performance better than traditional regression techniques for LOS, EBL, and operative time. The best performing models for predicting LOS and EBL were trained on data collected in the last 5 years, while operative time used the entire 10-year dataset. No models were able to predict SRS-Pain or SRS-Self-Image in any useful manner. Point-precise estimates for continuous variables were subject to high average errors. CONCLUSIONS: Classification of benchmark outcomes is improved with ensemble ML techniques and may provide much needed case-adjustment for a surgeon performance program. Precise estimates of health-related quality of life scores and continuous variables were not possible, suggesting that performance classification is a better method of performance evaluation.


Subject(s)
Kyphosis , Scoliosis , Humans , Adolescent , Scoliosis/surgery , Benchmarking , Retrospective Studies , Quality of Life , Pain
19.
JAMA Intern Med ; 183(6): 611-612, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37010858

ABSTRACT

This cohort study uses data from electronic health records to assess variability in a sepsis prediction model across 9 hospitals.


Subject(s)
Models, Statistical , Sepsis , Humans , Prognosis , Sepsis/diagnosis , Hospitals , Patient Care
20.
Artif Organs ; 47(9): 1490-1502, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37032544

ABSTRACT

BACKGROUND: Veno-venous extracorporeal membrane oxygenation (V-V ECMO) is a lifesaving support modality for severe respiratory failure, but its resource-intensive nature led to significant controversy surrounding its use during the COVID-19 pandemic. We report the performance of several ECMO mortality prediction and severity of illness scores at discriminating survival in a large COVID-19 V-V ECMO cohort. METHODS: We validated ECMOnet, PRESET (PREdiction of Survival on ECMO Therapy-Score), Roch, SOFA (Sequential Organ Failure Assessment), APACHE II (acute physiology and chronic health evaluation), 4C (Coronavirus Clinical Characterisation Consortium), and CURB-65 (Confusion, Urea nitrogen, Respiratory Rate, Blood Pressure, age >65 years) scores on the ISARIC (International Severe Acute Respiratory and emerging Infection Consortium) database. We report discrimination via Area Under the Receiver Operative Curve (AUROC) and Area under the Precision Recall Curve (AURPC) and calibration via Brier score. RESULTS: We included 1147 patients and scores were calculated on patients with sufficient variables. ECMO mortality scores had AUROC (0.58-0.62), AUPRC (0.62-0.74), and Brier score (0.286-0.303). Roch score had the highest accuracy (AUROC 0.62), precision (AUPRC 0.74) yet worst calibration (Brier score of 0.3) despite being calculated on the fewest patients (144). Severity of illness scores had AUROC (0.52-0.57), AURPC (0.59-0.64), and Brier Score (0.265-0.471). APACHE II had the highest accuracy (AUROC 0.58), precision (AUPRC 0.64), and best calibration (Brier score 0.26). CONCLUSION: Within a large international multicenter COVID-19 cohort, the evaluated ECMO mortality prediction and severity of illness scores demonstrated inconsistent discrimination and calibration highlighting the need for better clinically applicable decision support tools.


Subject(s)
COVID-19 , Extracorporeal Membrane Oxygenation , Humans , Aged , Pandemics , Retrospective Studies , COVID-19/diagnosis , COVID-19/therapy , APACHE
SELECTION OF CITATIONS
SEARCH DETAIL
...