Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Expert Opin Drug Saf ; 22(8): 659-668, 2023.
Article in English | MEDLINE | ID: mdl-37339273

ABSTRACT

INTRODUCTION: Pharmacovigilance (PV) involves monitoring and aggregating adverse event information from a variety of data sources, including health records, biomedical literature, spontaneous adverse event reports, product labels, and patient-generated content like social media posts, but the most pertinent details in these sources are typically available in narrative free-text formats. Natural language processing (NLP) techniques can be used to extract clinically relevant information from PV texts to inform decision-making. AREAS COVERED: We conducted a non-systematic literature review by querying the PubMed database to examine the uses of NLP in drug safety and distilled the findings to present our expert opinion on the topic. EXPERT OPINION: New NLP techniques and approaches continue to be applied for drug safety use cases; however, systems that are fully deployed and in use in a clinical environment remain vanishingly rare. To see high-performing NLP techniques implemented in the real setting will require long-term engagement with end users and other stakeholders and revised workflows in fully formulated business plans for the targeted use cases. Additionally, we found little to no evidence of extracted information placed into standardized data models, which should be a way to make implementations more portable and adaptable.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Social Media , Humans , Natural Language Processing , Drug-Related Side Effects and Adverse Reactions/prevention & control , Adverse Drug Reaction Reporting Systems , Pharmacovigilance
2.
JCO Clin Cancer Inform ; 7: e2200108, 2023 04.
Article in English | MEDLINE | ID: mdl-37040583

ABSTRACT

PURPOSE: Precision oncology mandates developing standardized common data models (CDMs) to facilitate analyses and enable clinical decision making. Expert-opinion-based precision oncology initiatives are epitomized in Molecular Tumor Boards (MTBs), which process large volumes of clinical-genomic data to match genotypes with molecularly guided therapies. METHODS: We used the Johns Hopkins University MTB as a use case and developed a precision oncology core data model (Precision-DM) to capture key clinical-genomic data elements. We leveraged existing CDMs, building upon the Minimal Common Oncology Data Elements model (mCODE). Our model was defined as a set of profiles with multiple data elements, focusing on next-generation sequencing and variant annotations. Most elements were mapped to terminologies or code sets and the Fast Healthcare Interoperability Resources (FHIR). We subsequently compared our Precision-DM with existing CDMs, including the National Cancer Institute's Genomic Data Commons (NCI GDC), mCODE, OSIRIS, the clinical Genome Data Model (cGDM), and the genomic CDM (gCDM). RESULTS: Precision-DM contained 16 profiles and 355 data elements. 39% of the elements derived values from selected terminologies or code sets, and 61% were mapped to FHIR. Although we used most elements contained in mCODE, we significantly expanded the profiles to include genomic annotations, resulting in a partial overlap of 50.7% between our core model and mCODE. Limited overlap was noted between Precision-DM and OSIRIS (33.2%), NCI GDC (21.4%), cGDM (9.3%), and gCDM (7.9%). Precision-DM covered most of the mCODE elements (87.7%), with less coverage for OSIRIS (35.8%), NCI GDC (11%), cGDM (26%) and gCDM (33.3%). CONCLUSION: Precision-DM supports clinical-genomic data standardization to support the MTB use case and may allow for harmonized data pulls across health care systems, academic institutions, and community medical centers.


Subject(s)
Neoplasms , Humans , Neoplasms/therapy , Precision Medicine/methods , Genomics/methods , Clinical Decision-Making , Decision Making
3.
J Biomed Inform ; 140: 104335, 2023 04.
Article in English | MEDLINE | ID: mdl-36933631

ABSTRACT

Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.


Subject(s)
Algorithms , Electronic Health Records , Machine Learning , Natural Language Processing , Phenotype
4.
Stud Health Technol Inform ; 295: 398-401, 2022 Jun 29.
Article in English | MEDLINE | ID: mdl-35773895

ABSTRACT

Many decision support methods and systems in pharmacovigilance are built without explicitly addressing specific challenges that jeopardize their eventual success. We describe two sets of challenges and appropriate strategies to address them. The first are data-related challenges, which include using extensive multi-source data of poor quality, incomplete information integration, and inefficient data visualization. The second are user-related challenges, which encompass users' overall expectations and their engagement in developing automated solutions. Pharmacovigilance decision support systems will need to rely on advanced methods, such as natural language processing and validated mathematical models, to resolve data-related issues and provide properly contextualized data. However, sophisticated approaches will not provide a complete solution if end-users do not actively participate in their development, which will ensure tools that efficiently complement existing processes without creating unnecessary resistance. Our group has already tackled these issues and applied the proposed strategies in multiple projects.


Subject(s)
Decision Support Systems, Clinical/standards , Decision Support Systems, Management/standards , Natural Language Processing , Pharmacovigilance , Data Accuracy , User-Computer Interface
5.
Stud Health Technol Inform ; 289: 18-21, 2022 Jan 14.
Article in English | MEDLINE | ID: mdl-35062081

ABSTRACT

Processing unstructured clinical texts is often necessary to support certain tasks in biomedicine, such as matching patients to clinical trials. Among other methods, domain-specific language models have been built to utilize free-text information. This study evaluated the performance of Bidirectional Encoder Representations from Transformers (BERT) models in assessing the similarity between clinical trial texts. We compared an unstructured aggregated summary of clinical trials reviewed at the Johns Hopkins Molecular Tumor Board with the ClinicalTrials.gov records, focusing on the titles and eligibility criteria. Seven pretrained BERT-Based models were used in our analysis. Of the six biomedical-domain-specific models, only SciBERT outperformed the original BERT model by accurately assigning higher similarity scores to matched than mismatched trials. This finding is promising and shows that BERT and, likely, other language models may support patient-trial matching.


Subject(s)
Natural Language Processing , Semantics , Clinical Trials as Topic , Humans , Language
6.
Comput Biol Med ; 135: 104517, 2021 08.
Article in English | MEDLINE | ID: mdl-34130003

ABSTRACT

BACKGROUND: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. METHOD: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization-Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. RESULTS: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. CONCLUSIONS: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Pharmacovigilance , Adverse Drug Reaction Reporting Systems , Drug-Related Side Effects and Adverse Reactions/epidemiology , Humans , Machine Learning , United States , United States Food and Drug Administration
7.
Drug Saf ; 43(9): 905-915, 2020 09.
Article in English | MEDLINE | ID: mdl-32445187

ABSTRACT

INTRODUCTION: The US FDA receives more than 2 million postmarket reports each year. Safety Evaluators (SEs) review these reports, as well as external information, to identify potential safety signals. With the increasing number of reports and the size of external information, more efficient solutions for data integration and decision making are needed. OBJECTIVES: The aim of this study was to develop an interactive decision support application for drug safety surveillance that integrates and visualizes information from postmarket reports, product labels, and biomedical literature. METHODS: We conducted multiple meetings with a group of seven SEs at the FDA to collect the requirements for the Information Visualization Platform (InfoViP). Using infographic design principles, we implemented the InfoViP prototype version as a modern web application using the integrated information collected from the FDA Adverse Event Reporting System, the DailyMed repository, and PubMed. The same group of SEs evaluated the InfoViP prototype functionalities using a simple evaluation form and provided input for potential enhancements. RESULTS: The SEs described their workflows and overall expectations around the automation of time-consuming tasks, including the access to the visualization of external information. We developed a set of wireframes, shared them with the SEs, and finalized the InfoViP design. The InfoViP prototype architecture relied on a javascript and a python-based framework, as well as an existing tool for the processing of free-text information in all sources. This natural language processing tool supported multiple functionalities, especially the construction of time plots for individual postmarket reports and groups of reports. Overall, we received positive comments from the SEs during the InfoViP prototype evaluation and addressed their suggestions in the final version. CONCLUSIONS: The InfoViP system uses context-driven interactive visualizations and informatics tools to assist FDA SEs in synthesizing data from multiple sources for their case series analyses.


Subject(s)
Decision Support Techniques , Geographic Information Systems , Image Processing, Computer-Assisted , Product Surveillance, Postmarketing , Humans , Natural Language Processing , United States , United States Food and Drug Administration
8.
Health Informatics J ; 25(4): 1232-1243, 2019 12.
Article in English | MEDLINE | ID: mdl-29359620

ABSTRACT

Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.


Subject(s)
Data Mining , Drug Labeling , Drug-Related Side Effects and Adverse Reactions , Electronic Health Records , Natural Language Processing , United States , United States Food and Drug Administration
9.
Vaccine ; 36(29): 4325-4330, 2018 07 05.
Article in English | MEDLINE | ID: mdl-29880244

ABSTRACT

As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.


Subject(s)
Adverse Drug Reaction Reporting Systems/standards , Reference Standards , Vaccines/adverse effects , Centers for Disease Control and Prevention, U.S. , Humans , United States , United States Food and Drug Administration
10.
Article in English | MEDLINE | ID: mdl-28815108

ABSTRACT

Literature review is critical but time-consuming in the post-market surveillance of medical products. We focused on the safety signal of intussusception after the vaccination of infants with the Rotashield Vaccine in 1999 and retrieved all PubMed abstracts for rotavirus vaccines published after January 1, 1998. We used the Event-based Text-mining of Health Electronic Records system, the MetaMap tool, and the National Center for Biomedical Ontologies Annotator to process the abstracts and generate coded terms stamped with the date of publication. Data were analyzed in the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment to evaluate the intussusception-related findings before and after the release of the new rotavirus vaccines in 2006. The tight connection of intussusception with the historical signal in the first period and the absence of any safety concern for the new vaccines in the second period were verified. We demonstrated the feasibility for semi-automated solutions that may assist medical reviewers in monitoring biomedical literature.

11.
J Biomed Inform ; 73: 14-29, 2017 09.
Article in English | MEDLINE | ID: mdl-28729030

ABSTRACT

We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.


Subject(s)
Electronic Health Records , Natural Language Processing , Humans
12.
Appl Clin Inform ; 8(2): 396-411, 2017 04 26.
Article in English | MEDLINE | ID: mdl-28447098

ABSTRACT

OBJECTIVE: To evaluate the feasibility of automated dose and adverse event information retrieval in supporting the identification of safety patterns. METHODS: We extracted all rabbit Anti-Thymocyte Globulin (rATG) reports submitted to the United States Food and Drug Administration Adverse Event Reporting System (FAERS) from the product's initial licensure in April 16, 1984 through February 8, 2016. We processed the narratives using the Medication Extraction (MedEx) and the Event-based Text-mining of Health Electronic Records (ETHER) systems and retrieved the appropriate medication, clinical, and temporal information. When necessary, the extracted information was manually curated. This process resulted in a high quality dataset that was analyzed with the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA) to explore the association of rATG dosing with post-transplant lymphoproliferative disorder (PTLD). RESULTS: Although manual curation was necessary to improve the data quality, MedEx and ETHER supported the extraction of the appropriate information. We created a final dataset of 1,380 cases with complete information for rATG dosing and date of administration. Analysis in PANACEA found that PTLD was associated with cumulative doses of rATG >8 mg/kg, even in periods where most of the submissions to FAERS reported low doses of rATG. CONCLUSION: We demonstrated the feasibility of investigating a dose-related safety pattern for a particular product in FAERS using a set of automated tools.


Subject(s)
Adverse Drug Reaction Reporting Systems , Antilymphocyte Serum/adverse effects , Data Mining/methods , Natural Language Processing , Safety , Dose-Response Relationship, Drug , Feasibility Studies , Humans , Time Factors
13.
Drug Saf ; 40(7): 571-582, 2017 07.
Article in English | MEDLINE | ID: mdl-28293864

ABSTRACT

INTRODUCTION: Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs. OBJECTIVE: We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS). METHODS: In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports. RESULTS: For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm's automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives. CONCLUSIONS: The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.


Subject(s)
Adverse Drug Reaction Reporting Systems , Data Interpretation, Statistical , Data Mining , Databases, Factual , Humans , United States
14.
J Biomed Inform ; 64: 354-362, 2016 12.
Article in English | MEDLINE | ID: mdl-27477839

ABSTRACT

We have developed a Decision Support Environment (DSE) for medical experts at the US Food and Drug Administration (FDA). The DSE contains two integrated systems: The Event-based Text-mining of Health Electronic Records (ETHER) and the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA). These systems assist medical experts in reviewing reports submitted to the Vaccine Adverse Event Reporting System (VAERS) and the FDA Adverse Event Reporting System (FAERS). In this manuscript, we describe the DSE architecture and key functionalities, and examine its potential contributions to the signal management process by focusing on four use cases: the identification of missing cases from a case series, the identification of duplicate case reports, retrieving cases for a case series analysis, and community detection for signal identification and characterization.


Subject(s)
Adverse Drug Reaction Reporting Systems , Data Mining , Decision Support Techniques , United States Food and Drug Administration , Environment , Humans , Research Report , United States
15.
J Biomed Inform ; 62: 78-89, 2016 08.
Article in English | MEDLINE | ID: mdl-27327528

ABSTRACT

The sheer volume of textual information that needs to be reviewed and analyzed in many clinical settings requires the automated retrieval of key clinical and temporal information. The existing natural language processing systems are often challenged by the low quality of clinical texts and do not demonstrate the required performance. In this study, we focus on medical product safety report narratives and investigate the association of the clinical events with appropriate time information. We developed a novel algorithm for tagging and extracting temporal information from the narratives, and associating it with related events. The proposed algorithm minimizes the performance dependency on text quality by relying only on shallow syntactic information and primitive properties of the extracted event and time entities. We demonstrated the effectiveness of the proposed algorithm by evaluating its tagging and time assignment capabilities on 140 randomly selected reports from the US Vaccine Adverse Event Reporting System (VAERS) and the FDA (Food and Drug Administration) Adverse Event Reporting System (FAERS). We compared the performance of our tagger with the SUTime and HeidelTime taggers, and our algorithm's event-time associations with the Temporal Awareness and Reasoning Systems for Question Interpretation (TARSQI). We further evaluated the ability of our algorithm to correctly identify the time information for the events in the 2012 Informatics for Integrating Biology and the Bedside (i2b2) Challenge corpus. For the time tagging task, our algorithm performed better than the SUTime and the HeidelTime taggers (F-measure in VAERS and FAERS: Our algorithm: 0.86 and 0.88, SUTime: 0.77 and 0.74, and HeidelTime 0.75 and 0.42, respectively). In the event-time association task, our algorithm assigned an inappropriate timestamp for 25% of the events, while the TARSQI toolkit demonstrated a considerably lower performance, assigning inappropriate timestamps in 61.5% of the same events. Our algorithm also supported the correct calculation of 69% of the event relations to the section time in the i2b2 testing set.


Subject(s)
Algorithms , Electronic Health Records , Narration , Natural Language Processing , Humans , Research Report
SELECTION OF CITATIONS
SEARCH DETAIL
...