Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
JAMIA Open ; 2(3): 301-305, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31709388

ABSTRACT

OBJECTIVES: To investigate using patient posts in social media as a resource to profile off-label prescriptions of cancer drugs. METHODS: We analyzed patient posts from the Inspire health forums (www.inspire.com) and extracted mentions of cancer drugs from the 14 most active cancer-type specific support groups. To quantify drug-disease associations, we calculated information component scores from the frequency of posts in each cancer-specific group with mentions of a given drug. We evaluated the results against three sources: manual review, Wolters-Kluwer Medi-span, and Truven MarketScan insurance claims. RESULTS: We identified 279 frequently discussed and therefore highly associated drug-disease pairs from Inspire posts. Of these, 96 are FDA approved, 9 are known off-label uses, and 174 do not have records of known usage (potentially novel off-label uses). We achieved a mean average precision of 74.9% in identifying drug-disease pairs with a true indication association from patient posts and found consistent evidence in medical claims records. We achieved a recall of 69.2% in identifying known off-label drug uses (based on Wolters-Kluwer Medi-span) from patient posts.

2.
JMIR Public Health Surveill ; 5(2): e11264, 2019 Jun 03.
Article in English | MEDLINE | ID: mdl-31162134

ABSTRACT

BACKGROUND: Adverse drug reactions (ADRs) occur in nearly all patients on chemotherapy, causing morbidity and therapy disruptions. Detection of such ADRs is limited in clinical trials, which are underpowered to detect rare events. Early recognition of ADRs in the postmarketing phase could substantially reduce morbidity and decrease societal costs. Internet community health forums provide a mechanism for individuals to discuss real-time health concerns and can enable computational detection of ADRs. OBJECTIVE: The goal of this study is to identify cutaneous ADR signals in social health networks and compare the frequency and timing of these ADRs to clinical reports in the literature. METHODS: We present a natural language processing-based, ADR signal-generation pipeline based on patient posts on Internet social health networks. We identified user posts from the Inspire health forums related to two chemotherapy classes: erlotinib, an epidermal growth factor receptor inhibitor, and nivolumab and pembrolizumab, immune checkpoint inhibitors. We extracted mentions of ADRs from unstructured content of patient posts. We then performed population-level association analyses and time-to-detection analyses. RESULTS: Our system detected cutaneous ADRs from patient reports with high precision (0.90) and at frequencies comparable to those documented in the literature but an average of 7 months ahead of their literature reporting. Known ADRs were associated with higher proportional reporting ratios compared to negative controls, demonstrating the robustness of our analyses. Our named entity recognition system achieved a 0.738 microaveraged F-measure in detecting ADR entities, not limited to cutaneous ADRs, in health forum posts. Additionally, we discovered the novel ADR of hypohidrosis reported by 23 patients in erlotinib-related posts; this ADR was absent from 15 years of literature on this medication and we recently reported the finding in a clinical oncology journal. CONCLUSIONS: Several hundred million patients report health concerns in social health networks, yet this information is markedly underutilized for pharmacosurveillance. We demonstrated the ability of a natural language processing-based signal-generation pipeline to accurately detect patient reports of ADRs months in advance of literature reporting and the robustness of statistical analyses to validate system detections. Our findings suggest the important contributions that social health network data can play in contributing to more comprehensive and timely pharmacovigilance.

5.
AMIA Annu Symp Proc ; 2017: 679-688, 2017.
Article in English | MEDLINE | ID: mdl-29854133

ABSTRACT

Social networks, such as Twitter, have become important sources for active monitoring of user-reported adverse drug reactions (ADRs). Automatic extraction of ADR information can be crucial for healthcare providers, drug manufacturers, and consumers. However, because of the non-standard nature of social media language, automatically extracted ADR mentions need to be mapped to standard forms before they can be used by operational pharmacovigilance systems. We propose a modular natural language processing pipeline for mapping (normalizing) colloquial mentions of ADRs to their corresponding standardized identifiers. We seek to accomplish this task and enable customization of the pipeline so that distinct unlabeled free text resources can be incorporated to use the system for other normalization tasks. Our approach, which we call Hybrid Semantic Analysis (HSA), sequentially employs rule-based and semantic matching algorithms for mapping user-generated mentions to concept IDs in the Unified Medical Language System vocabulary. The semantic matching component of HSA is adaptive in nature and uses a regression model to combine various measures of semantic relatedness and resources to optimize normalization performance on the selected data source. On a publicly available corpus, our normalization method achieves 0.502 recall and 0.823 precision (F-measure: 0.624). Our proposed method outperforms a baseline based on latent semantic analysis and another that uses MetaMap.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Natural Language Processing , Pharmacovigilance , Social Media , Terminology as Topic , Algorithms , Crowdsourcing , Humans , Information Storage and Retrieval , Semantics , Software , Unified Medical Language System
6.
J Biomed Inform ; 62: 148-58, 2016 08.
Article in English | MEDLINE | ID: mdl-27363901

ABSTRACT

OBJECTIVE: The abundance of text available in social media and health related forums along with the rich expression of public opinion have recently attracted the interest of the public health community to use these sources for pharmacovigilance. Based on the intuition that patients post about Adverse Drug Reactions (ADRs) expressing negative sentiments, we investigate the effect of sentiment analysis features in locating ADR mentions. METHODS: We enrich the feature space of a state-of-the-art ADR identification method with sentiment analysis features. Using a corpus of posts from the DailyStrength forum and tweets annotated for ADR and indication mentions, we evaluate the extent to which sentiment analysis features help in locating ADR mentions and distinguishing them from indication mentions. RESULTS: Evaluation results show that sentiment analysis features marginally improve ADR identification in tweets and health related forum posts. Adding sentiment analysis features achieved a statistically significant F-measure increase from 72.14% to 73.22% in the Twitter part of an existing corpus using its original train/test split. Using stratified 10×10-fold cross-validation, statistically significant F-measure increases were shown in the DailyStrength part of the corpus, from 79.57% to 80.14%, and in the Twitter part of the corpus, from 66.91% to 69.16%. Moreover, sentiment analysis features are shown to reduce the number of ADRs being recognized as indications. CONCLUSION: This study shows that adding sentiment analysis features can marginally improve the performance of even a state-of-the-art ADR identification method. This improvement can be of use to pharmacovigilance practice, due to the rapidly increasing popularity of social media and health forums.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Pharmacovigilance , Social Media , Humans , Internet , Public Health
7.
Pac Symp Biocomput ; 21: 581-92, 2016.
Article in English | MEDLINE | ID: mdl-26776221

ABSTRACT

Social media has evolved into a crucial resource for obtaining large volumes of real-time information. The promise of social media has been realized by the public health domain, and recent research has addressed some important challenges in that domain by utilizing social media data. Tasks such as monitoring flu trends, viral disease outbreaks, medication abuse, and adverse drug reactions are some examples of studies where data from social media have been exploited. The focus of this workshop is to explore solutions to three important natural language processing challenges for domain-specific social media text: (i) text classification, (ii) information extraction, and (iii) concept normalization. To explore different approaches to solving these problems on social media data, we designed a shared task which was open to participants globally. We designed three tasks using our in-house annotated Twitter data on adverse drug reactions. Task 1 involved automatic classification of adverse drug reaction assertive user posts; Task 2 focused on extracting specific adverse drug reaction mentions from user posts; and Task 3, which was slightly ill-defined due to the complex nature of the problem, involved normalizing user mentions of adverse drug reactions to standardized concept IDs. A total of 11 teams participated, and a total of 24 (18 for Task 1, and 6 for Task 2) system runs were submitted. Following the evaluation of the systems, and an assessment of their innovation/novelty, we accepted 7 descriptive manuscripts for publication--5 for Task 1 and 2 for Task 2. We provide descriptions of the tasks, data, and participating systems in this paper.


Subject(s)
Data Mining/methods , Social Media/statistics & numerical data , Adverse Drug Reaction Reporting Systems/statistics & numerical data , Computational Biology/methods , Computational Biology/statistics & numerical data , Data Mining/statistics & numerical data , Drug-Related Side Effects and Adverse Reactions/classification , Humans , Natural Language Processing , Pharmacovigilance , Supervised Machine Learning , Support Vector Machine
8.
J Am Med Inform Assoc ; 22(3): 671-81, 2015 May.
Article in English | MEDLINE | ID: mdl-25755127

ABSTRACT

OBJECTIVE: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. METHODS: We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. RESULTS: ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. CONCLUSION: It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.


Subject(s)
Artificial Intelligence , Data Mining/methods , Pharmacovigilance , Social Media , Humans , Natural Language Processing , Semantics
9.
J Biomed Inform ; 54: 202-12, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25720841

ABSTRACT

OBJECTIVE: Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media. METHODS: We identified studies describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to extract ADR information posted by users on any publicly available social media platform. We categorized the studies according to different characteristics such as primary ADR detection approach, size of corpus, data source(s), availability, and evaluation criteria. RESULTS: Twenty-two studies met our inclusion criteria, with fifteen (68%) published within the last two years. However, publicly available annotated data is still scarce, and we found only six studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular. CONCLUSION: Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing. In terms of sources, both health-related and general social media data have been used for ADR detection-while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited amount of annotated data publicly available , and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.


Subject(s)
Adverse Drug Reaction Reporting Systems , Pharmacovigilance , Social Media , Humans , Internet , MEDLINE , Public Health
10.
Article in English | MEDLINE | ID: mdl-25209025

ABSTRACT

Finding gene functions discussed in the literature is an important task of information extraction (IE) from biomedical documents. Automated computational methodologies can significantly reduce the need for manual curation and improve quality of other related IE systems. We propose an open-IE method for the BioCreative IV GO shared task (subtask b), focused on finding gene function terms [Gene Ontology (GO) terms] for different genes in an article. The proposed open-IE approach is based on distributional semantic similarity over the GO terms. The method does not require annotated data for training, which makes it highly generalizable. We achieve an F-measure of 0.26 on the test-set in the official submission for BioCreative-GO shared task, the third highest F-measure among the seven participants in the shared task. DATABASE URL: https://code.google.com/p/rainbow-nlp/


Subject(s)
Computational Biology/methods , Data Mining/methods , Genes , Semantics , Vocabulary, Controlled , Genes/genetics , Genes/physiology , Internet
11.
AMIA Annu Symp Proc ; 2014: 924-33, 2014.
Article in English | MEDLINE | ID: mdl-25954400

ABSTRACT

Recent research has shown that Twitter data analytics can have broad implications on public health research. However, its value for pharmacovigilance has been scantly studied - with health related forums and community support groups preferred for the task. We present a systematic study of tweets collected for 74 drugs to assess their value as sources of potential signals for adverse drug reactions (ADRs). We created an annotated corpus of 10,822 tweets. Each tweet was annotated for the presence or absence of ADR mentions, with the span and Unified Medical Language System (UMLS) concept ID noted for each ADR present. Using Cohen's kappa1, we calculated the inter-annotator agreement (IAA) for the binary annotations to be 0.69. To demonstrate the utility of the corpus, we attempted a lexicon-based approach for concept extraction, with promising success (54.1% precision, 62.1% recall, and 57.8% F-measure). A subset of the corpus is freely available at: http://diego.asu.edu/downloads.


Subject(s)
Data Mining/methods , Drug-Related Side Effects and Adverse Reactions , Internet , Pharmacovigilance , Humans , Prescription Drugs/adverse effects
12.
Article in English | MEDLINE | ID: mdl-25717407

ABSTRACT

Social media postings are rich in information that often remain hidden and inaccessible for automatic extraction due to inherent limitations of the site's APIs, which mostly limit access via specific keyword-based searches (and limit both the number of keywords and the number of postings that are returned). When mining social media for drug mentions, one of the first problems to solve is how to derive a list of variants of the drug name (common misspellings) that can capture a sufficient number of postings. We present here an approach that filters the potential variants based on the intuition that, faced with the task of writing an unfamiliar, complex word (the drug name), users will tend to revert to phonetic spelling, and we thus give preference to variants that reflect the phonemes of the correct spelling. The algorithm allowed us to capture 50.4 - 56.0 % of the user comments using only about 18% of the variants.

13.
J Biomed Inform ; 46 Suppl: S40-S47, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24212118

ABSTRACT

Clinical records include both coded and free-text fields that interact to reflect complicated patient stories. The information often covers not only the present medical condition and events experienced by the patient, but also refers to relevant events in the past (such as signs, symptoms, tests or treatments). In order to automatically construct a timeline of these events, we first need to extract the temporal relations between pairs of events or time expressions presented in the clinical notes. We designed separate extraction components for different types of temporal relations, utilizing a novel hybrid system that combines machine learning with a graph-based inference mechanism to extract the temporal links. The temporal graph is a directed graph based on parse tree dependencies of the simplified sentences and frequent pattern clues. We generalized the sentences in order to discover patterns that, given the complexities of natural language, might not be directly discoverable in the original sentences. The proposed hybrid system performance reached an F-measure of 0.63, with precision at 0.76 and recall at 0.54 on the 2012 i2b2 Natural Language Processing corpus for the temporal relation (TLink) extraction task, achieving the highest precision and third highest f-measure among participating teams in the TLink track.


Subject(s)
Data Mining/methods , Electronic Health Records , Medical Informatics/methods , Natural Language Processing , Humans , Support Vector Machine , Time Factors
14.
Biomed Inform Insights ; 5(Suppl. 1): 165-74, 2012.
Article in English | MEDLINE | ID: mdl-22879773

ABSTRACT

The reasons that drive someone to commit suicide are complex and their study has attracted the attention of scientists in different domains. Analyzing this phenomenon could significantly improve the preventive efforts. In this paper we present a method for sentiment analysis of suicide notes submitted to the i2b2/VA/Cincinnati Shared Task 2011. In this task the sentences of 900 suicide notes were labeled with the possible emotions that they reflect. In order to label the sentence with emotions, we propose a hybrid approach which utilizes both rule based and machine learning techniques. To solve the multi class problem a rule-based engine and an SVM model is used for each category. A set of syntactic and semantic features are selected for each sentence to build the rules and train the classifier. The rules are generated manually based on a set of lexical and emotional clues. We propose a new approach to extract the sentence's clauses and constitutive grammatical elements and to use them in syntactic and semantic feature generation. The method utilizes a novel method to measure the polarity of the sentence based on the extracted grammatical elements, reaching precision of 41.79 with recall of 55.03 for an f-measure of 47.50. The overall mean f-measure of all submissions was 48.75% with a standard deviation of 7%.

15.
AMIA Annu Symp Proc ; 2011: 1019-26, 2011.
Article in English | MEDLINE | ID: mdl-22195162

ABSTRACT

Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%.


Subject(s)
Data Mining/methods , Drug-Related Side Effects and Adverse Reactions , Natural Language Processing , Pattern Recognition, Automated/methods , Algorithms , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...