Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Article in English | MEDLINE | ID: mdl-24303281

ABSTRACT

Clinical research studying critical illness phenotypes relies on the identification of clinical syndromes defined by consensus definitions. Historically, identifying phenotypes has required manual chart review, a time and resource intensive process. The overall research goal of C ritical I llness PH enotype E xt R action (deCIPHER) project is to develop automated approaches based on natural language processing and machine learning that accurately identify phenotypes from EMR. We chose pneumonia as our first critical illness phenotype and conducted preliminary experiments to explore the problem space. In this abstract, we outline the tools we built for processing clinical records, present our preliminary findings for pneumonia extraction, and describe future steps.

2.
J Biomed Inform ; 46(6): 998-1005, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24025513

ABSTRACT

OBJECTIVES: Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help. METHODS: We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ² statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. RESULTS: Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts. DISCUSSION: We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. CONCLUSION: Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.


Subject(s)
Social Support , Text Messaging , Internet
3.
J Biomed Inform ; 46(2): 354-62, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23354284

ABSTRACT

Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. The absence of an automated system to identify and track radiology recommendations is an important barrier to ensuring timely follow-up of patients especially with non-acute incidental findings on imaging examinations. In this paper, we present a text processing pipeline to automatically identify clinically important recommendation sentences in radiology reports. Our extraction pipeline is based on natural language processing (NLP) and supervised text classification methods. To develop and test the pipeline, we created a corpus of 800 radiology reports double annotated for recommendation sentences by a radiologist and an internist. We ran several experiments to measure the impact of different feature types and the data imbalance between positive and negative recommendation sentences. Our fully statistical approach achieved the best f-score 0.758 in identifying the critical recommendation sentences in radiology reports.


Subject(s)
Algorithms , Medical Informatics/methods , Natural Language Processing , Radiographic Image Interpretation, Computer-Assisted , Radiology Information Systems , Databases, Factual , Humans
4.
J Biomed Inform ; 46(1): 68-74, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23000479

ABSTRACT

This paper describes an approach to assertion classification and an empirical study on the impact this task has on phenotype identification, a real world application in the clinical domain. The task of assertion classification is to assign to each medical concept mentioned in a clinical report (e.g., pneumonia, chest pain) a specific assertion category (e.g., present, absent, and possible). To improve the classification of medical assertions, we propose several new features that capture the semantic properties of special cue words highly indicative of a specific assertion category. The results obtained outperform the current state-of-the-art results for this task. Furthermore, we confirm the intuition that assertion classification contributes in significantly improving the results of phenotype identification from free-text clinical records.


Subject(s)
Models, Theoretical , Pneumonia/physiopathology , Humans , Phenotype
5.
EGEMS (Wash DC) ; 1(1): 1025, 2013.
Article in English | MEDLINE | ID: mdl-25848565

ABSTRACT

BACKGROUND: The field of clinical research informatics includes creation of clinical data repositories (CDRs) used to conduct quality improvement (QI) activities and comparative effectiveness research (CER). Ideally, CDR data are accurately and directly abstracted from disparate electronic health records (EHRs), across diverse health-systems. OBJECTIVE: Investigators from Washington State's Surgical Care Outcomes and Assessment Program (SCOAP) Comparative Effectiveness Research Translation Network (CERTAIN) are creating such a CDR. This manuscript describes the automation and validation methods used to create this digital infrastructure. METHODS: SCOAP is a QI benchmarking initiative. Data are manually abstracted from EHRs and entered into a data management system. CERTAIN investigators are now deploying Caradigm's Amalga™ tool to facilitate automated abstraction of data from multiple, disparate EHRs. Concordance is calculated to compare data automatically to manually abstracted. Performance measures are calculated between Amalga and each parent EHR. Validation takes place in repeated loops, with improvements made over time. When automated abstraction reaches the current benchmark for abstraction accuracy - 95% - itwill 'go-live' at each site. PROGRESS TO DATE: A technical analysis was completed at 14 sites. Five sites are contributing; the remaining sites prioritized meeting Meaningful Use criteria. Participating sites are contributing 15-18 unique data feeds, totaling 13 surgical registry use cases. Common feeds are registration, laboratory, transcription/dictation, radiology, and medications. Approximately 50% of 1,320 designated data elements are being automatically abstracted-25% from structured data; 25% from text mining. CONCLUSION: In semi-automating data abstraction and conducting a rigorous validation, CERTAIN investigators will semi-automate data collection to conduct QI and CER, while advancing the Learning Healthcare System.

6.
AMIA Annu Symp Proc ; 2013: 103-10, 2013.
Article in English | MEDLINE | ID: mdl-24551325

ABSTRACT

In this paper we describe a natural language processing system which is able to predict whether or not a patient exhibits a specific phenotype using the information extracted from the narrative reports associated with the patient. Furthermore, the phenotypic annotations from our report dataset were performed at the report level which allows us to perform the prediction of the clinical phenotype at any point in time during the patient hospitalization period. Our experiments indicate that an important factor in achieving better results for this problem is to determine how much information to extract from the patient reports in the time interval between the patient admission time and the current prediction time.


Subject(s)
Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Pneumonia/diagnosis , Algorithms , Cohort Studies , Community-Acquired Infections/diagnosis , Hospitalization , Humans , Intensive Care Units , Narration , Phenotype
7.
J Am Med Inform Assoc ; 19(5): 817-23, 2012.
Article in English | MEDLINE | ID: mdl-22539080

ABSTRACT

OBJECTIVE: This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia. DESIGN: A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification. RESULTS: Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively. CONCLUSION: Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.


Subject(s)
Data Mining/methods , Diagnosis, Computer-Assisted , Electronic Health Records , Natural Language Processing , Pneumonia/diagnosis , Artificial Intelligence , Humans , Intensive Care Units , Sensitivity and Specificity , Unified Medical Language System
8.
AMIA Annu Symp Proc ; 2012: 1119-28, 2012.
Article in English | MEDLINE | ID: mdl-23304388

ABSTRACT

In this paper, we present a natural language processing system that can be used in hospital surveillance applications with the purpose of identifying patients with pneumonia. For this purpose, we built a sequence of supervised classifiers, where the dataset corresponding to each classifier consists of a restricted set of time-ordered narrative reports. In this way the pneumonia surveillance application will be able to invoke the most suitable classifier for each patient based on the period of time that has elapsed since the patient was admitted into the hospital. Our system achieves significantly better results when compared with a baseline previously proposed for pneumonia identification.


Subject(s)
Electronic Health Records , Natural Language Processing , Pneumonia/diagnosis , Electronic Health Records/classification , Hospitals , Humans , Mathematical Computing , Medical Records Systems, Computerized , Narration , Retrospective Studies
9.
Article in English | MEDLINE | ID: mdl-24499843

ABSTRACT

Researchers and practitioners show increasing sinterest in utilizing patient-generated information on the Web. Although the HCI and CSCW communities have provided many exciting opportunities for exploring new ideas and building broad agenda in health, few venues offer a platform for interdisciplinary and collaborative brainstorming about design challenges and opportunities in this space. The goal of this workshop is to provide participants with opportunities to interact with stakeholders from diverse backgrounds and practices-researchers, practitioners, designers, programmers, and ethnographers-and together generate tangible design outcomes that utilize patient-generated information on the Web. Through small multidisciplinary group work, we will provide participants with new collaboration opportunities, understanding of the state of the art, inspiration for future work, and ideally avenues for continuing to develop research and design ideas generated at the workshop.

10.
AMIA Annu Symp Proc ; 2011: 1593-602, 2011.
Article in English | MEDLINE | ID: mdl-22195225

ABSTRACT

Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.


Subject(s)
Algorithms , Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Radiology Information Systems , Electronic Health Records/classification , Humans , Knowledge Bases , Radiology/methods , Radiology Information Systems/classification , Semantics , Unified Medical Language System
11.
J Biomed Inform ; 44(6): 927-35, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21689783

ABSTRACT

OBJECTIVE: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. DESIGN: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. MEASUREMENTS: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and instance-based learning classifier. RESULTS: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. CONCLUSION: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.


Subject(s)
Biomedical Research , Cluster Analysis , Algorithms , Artificial Intelligence , Semantics , Unified Medical Language System
12.
AMIA Annu Symp Proc ; 2010: 1316, 2010.
Article in English | MEDLINE | ID: mdl-21785667

ABSTRACT

Amazon's Mechanical Turk (AMT) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this poster, we report our findings in using AMT to annotate biomedical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We also describe our observations on AMT workers' annotations.

13.
J Biomed Inform ; 42(4): 633-43, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19124086

ABSTRACT

While medical researchers formulate new hypotheses to test, they need to identify connections to their work from other parts of the medical literature. However, the current volume of information has become a great barrier for this task. Recently, many literature-based discovery (LBD) systems have been developed to help researchers identify new knowledge that bridges gaps across distinct sections of the medical literature. Each LBD system uses different methods for mining the connections from text and ranking the identified connections, but none of the currently available LBD evaluation approaches can be used to compare the effectiveness of these methods. In this paper, we present an evaluation methodology for LBD systems that allows comparisons across different systems. We demonstrate the abilities of our evaluation methodology by using it to compare the performance of different correlation-mining and ranking approaches used by existing LBD systems. This evaluation methodology should help other researchers compare approaches, make informed algorithm choices, and ultimately help to improve the performance of LBD systems overall.


Subject(s)
Abstracting and Indexing/methods , Algorithms , Databases, Bibliographic , Information Storage and Retrieval/methods , Biomedical Research/methods , Database Management Systems , Medical Subject Headings
14.
AMIA Annu Symp Proc ; : 830-4, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18999152

ABSTRACT

Correlation identification methods based on concept co-occurrences have been commonly used on medical free texts. However, concepts co-occur for different reasons, and generalizable approaches to determine the meaning of those co-occurrences are needed. In this paper, we propose a new extraction approach that incorporates UMLS and text classification methods to identify the semantics of the relationships between co-occurring concepts in MEDLINE abstracts. The major difficulty of our approach is the lack of annotated sentences for training and testing purposes. We describe how we semi-automatically annotate the sentences with a combination of heuristics and a partially supervised classification method. In our evaluations, we focus on extracting the meaning of only the correlations between drugs or chemicals and disorders, and we limit the meaning to treats and causes. Based on the good performance results, we believe that our approach shows great promise for tackling the difficult relationship-identification problem in medical free text.


Subject(s)
Abstracting and Indexing/methods , MEDLINE , Natural Language Processing , Pattern Recognition, Automated/methods , Semantics , Terminology as Topic , Unified Medical Language System , Algorithms , Artificial Intelligence , Information Storage and Retrieval/methods , United States
15.
J Biomed Inform ; 39(6): 600-11, 2006 Dec.
Article in English | MEDLINE | ID: mdl-16442852

ABSTRACT

The explosive growth in biomedical literature has made it difficult for researchers to keep up with advancements, even in their own narrow specializations. While researchers formulate new hypotheses to test, it is very important for them to identify connections to their work from other parts of the literature. However, the current volume of information has become a great barrier for this task and new automated tools are needed to help researchers identify new knowledge that bridges gaps across distinct sections of the literature. In this paper, we present a literature-based discovery system called LitLinker that incorporates knowledge-based methodologies with a statistical method to mine the biomedical literature for new, potentially causal connections between biomedical terms. We demonstrate LitLinker's ability to capture novel and interesting connections between diseases and chemicals, drugs, genes, or molecular sequences from the published biomedical literature. We also evaluate LitLinker's performance by using the information retrieval metrics of precision and recall.


Subject(s)
Databases, Bibliographic , Abstracting and Indexing , Algorithms , Alzheimer Disease/therapy , Biomedical Research , Database Management Systems , Humans , Medical Subject Headings , Migraine Disorders/therapy , Models, Statistical , Natural Language Processing , Pattern Recognition, Automated , PubMed , Schizophrenia/therapy , Software
16.
AMIA Annu Symp Proc ; : 849-53, 2005.
Article in English | MEDLINE | ID: mdl-16779160

ABSTRACT

This work explores the effect of text representation techniques on the overall performance of medical text classification. To accomplish this goal, we developed a text classification system that supports the very basic word representation (bag-of-words) and the more complex medical phrase representation (bag-of-phrases). We also combined word and phrase representations (hybrid) for further analysis. Our system extracts medical phrases from text by incorporating a medical knowledge base and natural language processing techniques. We conducted experiments to evaluate the effects of different representations by measuring the change in classification performance with MEDLINE documents from the OHSUMED dataset. We measured classification performance with information retrieval metrics; precision (p), recall (r), and F1-score (F1). In our experiments, we achieved better classification performance with the hybrid approach (p=0.87, r=0.46, F1=0.60) compared to the bag-of-words approach (p=0.85, r=0.44, F1=0.58) and the bag-of-phrases approach (p=0.87, r=0.42, F1=0.57).


Subject(s)
Abstracting and Indexing/methods , MEDLINE/classification , Natural Language Processing , Information Storage and Retrieval , Knowledge Bases
17.
AMIA Annu Symp Proc ; : 529-33, 2003.
Article in English | MEDLINE | ID: mdl-14728229

ABSTRACT

Although huge amounts of unstructured text are available as a rich source of biomedical knowledge, to process this unstructured knowledge requires tools that identify concepts from free-form text. MetaMap is one tool that system developers in biomedicine have commonly used for such a task, but few have studied how well it accomplishes this task in general. In this paper, we report on a study that compares MetaMap's performance against that of six people. Such studies are challenging because the task is inherently subjective and establishing consensus is difficult. Nonetheless, for those concepts that subjects generally agreed on, MetaMap was able to identify most concepts, if they were represented in the UMLS. However, MetaMap identified many other concepts that peo-ple did not. We also report on our analysis of the types of failures that MetaMap exhibited as well as trends in the way people chose to identify concepts.


Subject(s)
Abstracting and Indexing/methods , Natural Language Processing , Unified Medical Language System , Humans , MEDLINE
SELECTION OF CITATIONS
SEARCH DETAIL
...