Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Front Pharmacol ; 14: 1180962, 2023.
Article in English | MEDLINE | ID: mdl-37781703

ABSTRACT

Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI's ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.

2.
Cancers (Basel) ; 15(6)2023 Mar 20.
Article in English | MEDLINE | ID: mdl-36980739

ABSTRACT

Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale.

3.
Cancers (Basel) ; 14(13)2022 Jun 22.
Article in English | MEDLINE | ID: mdl-35804834

ABSTRACT

A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.

4.
JCO Clin Cancer Inform ; 5: 719-727, 2021 06.
Article in English | MEDLINE | ID: mdl-34197178

ABSTRACT

PURPOSE: To facilitate identification of clinical trial participation candidates, we developed a machine learning tool that automates the determination of a patient's metastatic status, on the basis of unstructured electronic health record (EHR) data. METHODS: This tool scans EHR documents, extracting text snippet features surrounding key words (such as metastatic, progression, and local). A regularized logistic regression model was trained and used to classify patients across five metastatic categories: highly likely and likely positive, highly likely and likely negative, and unknown. Using a real-world oncology database of patients with solid tumors with manually abstracted information as reference, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). We validated the performance in a real-world data set, evaluating accuracy gains upon additional user review of tool's outputs after integration into clinic workflows. RESULTS: In the training data set (N = 66,532), the model sensitivity and specificity (% [95% CI]) were 82.4 [81.9 to 83.0] and 95.5 [95.3 to 96.7], respectively; the PPV was 89.3 [88.8 to 90.0], and the NPV was 94.0 [93.8 to 94.2]. In the validation sample (n = 200 from five distinct care sites), after user review of model outputs, values increased to 97.1 [85.1 to 99.9] for sensitivity, 98.2 [94.8 to 99.6] for specificity, 91.9 [78.1 to 98.3] for PPV, and 99.4 [96.6 to 100.0] for NPV. The model assigned 163 of 200 patients to the highly likely categories. The error prevalence was 4% before and 2% after user review. CONCLUSION: This tool infers metastatic status from unstructured EHR data with high accuracy and high confidence in more than 75% of cases, without requiring additional manual review. By enabling efficient characterization of metastatic status, this tool could mitigate a key barrier for patient ascertainment and clinical trial participation in community clinics.


Subject(s)
Electronic Health Records , Neoplasms , Databases, Factual , Humans , Machine Learning , Neoplasms/therapy , Sensitivity and Specificity
5.
Am J Manag Care ; 27(7): 274-281, 2021 07.
Article in English | MEDLINE | ID: mdl-34314116

ABSTRACT

OBJECTIVES: Racial disparities in cancer care and outcomes remain a societal challenge. Medicaid expansion through the Affordable Care Act was intended to improve health care access and equity. This study aimed to assess whether state Medicaid expansions were associated with a reduction in racial disparities in timely treatment among patients diagnosed with advanced cancer. STUDY DESIGN: This difference-in-differences study analyzed deidentified electronic health record-derived data. Patients aged 18 to 64 years with advanced or metastatic cancers diagnosed between January 1, 2011, and January 31, 2019, and receiving systemic therapy were included. METHODS: The primary end point was receipt of timely treatment, defined as first-line systemic therapy starting within 30 days after diagnosis of advanced or metastatic disease. Racial disparity was defined as adjusted percentage-point (PP) difference for Black vs White patients, adjusted for age, sex, practice setting, cancer type, stage, insurance marketplace, and area unemployment rate, with time and state fixed effects. RESULTS: The study included 30,310 patients (12.3% Black race). Without Medicaid expansion, Black patients were less likely to receive timely treatment than White patients (43.7% vs 48.4%; adjusted difference, -4.8 PP; P < .001). With Medicaid expansion, this disparity was diminished and lost significance (49.7% vs 50.5%; adjusted difference, -0.8 PP; P = .605). The adjusted difference-in-differences estimate was a 3.9 PP reduction in racial disparity (95% CI, 0.1-7.7 PP; P = .045). CONCLUSIONS: Medicaid expansion was associated with reduced Black-White racial disparities in receipt of timely systemic treatment for patients with advanced or metastatic cancers.


Subject(s)
Medicaid , Neoplasms , Black or African American , Humans , Insurance Coverage , Neoplasms/therapy , Patient Protection and Affordable Care Act , Racial Groups , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...