Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters











Publication year range
1.
J Biomed Inform ; 54: 329-36, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25523466

ABSTRACT

INTRODUCTION: This article explores how measures of semantic similarity and relatedness are impacted by the semantic groups to which the concepts they are measuring belong. Our goal is to determine if there are distinctions between homogeneous comparisons (where both concepts belong to the same group) and heterogeneous ones (where the concepts are in different groups). Our hypothesis is that the similarity measures will be significantly affected since they rely on hierarchical is-a relations, whereas relatedness measures should be less impacted since they utilize a wider range of relations. In addition, we also evaluate the effect of combining different measures of similarity and relatedness. Our hypothesis is that these combined measures will more closely correlate with human judgment, since they better reflect the rich variety of information humans use when assessing similarity and relatedness. METHOD: We evaluate our method on four reference standards. Three of the reference standards were annotated by human judges for relatedness and one was annotated for similarity. RESULTS: We found significant differences in the correlation of semantic similarity and relatedness measures with human judgment, depending on which semantic groups were involved. We also found that combining a definition based relatedness measure with an information content similarity measure resulted in significant improvements in correlation over individual measures. AVAILABILITY: The semantic similarity and relatedness package is an open source program available from http://umls-similarity.sourceforge.net/. The reference standards are available at http://www.people.vcu.edu/∼{}btmcinnes/downloads.html.


Subject(s)
Natural Language Processing , Semantics , Unified Medical Language System/classification , Humans , Systematized Nomenclature of Medicine
2.
AMIA Annu Symp Proc ; 2014: 882-91, 2014.
Article in English | MEDLINE | ID: mdl-25954395

ABSTRACT

In this paper, we present the results of a method using undirected paths to determine the degree of semantic similarity between two concepts in a dense taxonomy with multiple inheritance. The overall objective of this work was to explore methods that take advantage of dense multi-hierarchical taxonomies that are more graph-like than tree-like by incorporating the proximity of concepts with respect to each other within the entire is-a hierarchy. Our hypothesis is that the proximity of the concepts regardless of how they are connected is an indicator to the degree of their similarity. We evaluate our method using the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and four reference standards that have been manually tagged by human annotators. The overall results of our experiments show, in SNOMED CT, the location of the concepts with respect to each other does indicate the degree to which they are similar.


Subject(s)
Semantics , Systematized Nomenclature of Medicine , Vocabulary, Controlled , Humans , Reference Standards , Statistics, Nonparametric , Unified Medical Language System
3.
J Biomed Inform ; 46(6): 1116-24, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24012881

ABSTRACT

INTRODUCTION: In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. METHOD: We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. RESULTS: We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. AVAILABILITY: The WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov.


Subject(s)
Semantics , Evaluation Studies as Topic , Unified Medical Language System
4.
Biomed Inform Insights ; 5(Suppl. 1): 185-93, 2012.
Article in English | MEDLINE | ID: mdl-22879775

ABSTRACT

This paper describes the Duluth systems that participated in the Sentiment Analysis track of the i2b2/VA/Cincinnati Children's 2011 Challenge. The top Duluth system was a rule-based approach derived through manual corpus analysis and the use of measures of association to identify significant ngrams. This performed in the median range of systems, attaining an F-measure of 0.45. The second system was automatically derived from the most frequent bigrams unique to one or two emotions. It achieved an F-measure of 0.36. The third system was the union of the first two, and reached an F-measure of 0.44.

5.
AMIA Annu Symp Proc ; 2012: 43-50, 2012.
Article in English | MEDLINE | ID: mdl-23304271

ABSTRACT

A potential use of automated concept similarity and relatedness measures is to improve automatic detection of clinical text that relates to a condition indicative of an adverse drug reaction. This is also one of the purposes of the Medical Dictionary for Regulatory Activities (MedDRA) Standardized Queries (SMQ). An expert panel evaluates SMQs for their ability to detect a condition of interest and thus qualifies them as a reference standard for evaluating automated approaches. We compare similarity and relatedness measurement methods on rates of correctly identifying intra-category and inter-category concept pairs from SMQ data to create ROC curves of each method's sensitivity and specificity. Results indicate an information content measure, specifically the Resnik method, achieved the highest results as measured by area under the curve, but using two different measures as predictors, Resnik and Lin, obtained the highest score. Overall, using SMQ data resulted in a productive method of evaluating automated semantic relatedness and similarity scores.


Subject(s)
Dictionaries, Medical as Topic , Semantics , Unified Medical Language System , Drug-Related Side Effects and Adverse Reactions , Linear Models , ROC Curve
6.
AMIA Annu Symp Proc ; 2012: 587-95, 2012.
Article in English | MEDLINE | ID: mdl-23304331

ABSTRACT

In this paper we examined the relationship between semantic relatedness among medical concepts found in clinical reports and biomedical literature. Our objective is to determine whether relations between medical concepts identified from Medline abstracts may be used to inform us as to the nature of the association between medical concepts that appear to be closely related based on their distribution in clinical reports. We used a corpus of 800k inpatient clinical notes as a source of data for determining the strength of association between medical concepts and SemRep database as a source of labeled relations extracted from Medline abstracts. The same pair of medical concepts may be found with more than one predicate type in the SemRep database but often with different frequencies. Our analysis shows that predicate type frequency information obtained from the SemRep database appears to be helpful for labeling semantic relations obtained with measures of semantic relatedness and similarity.


Subject(s)
MEDLINE , Medical Records , Natural Language Processing , Terminology as Topic , Abstracting and Indexing , Databases as Topic , Humans , Information Storage and Retrieval , Semantics , Software , Unified Medical Language System
7.
AMIA Annu Symp Proc ; 2011: 895-904, 2011.
Article in English | MEDLINE | ID: mdl-22195148

ABSTRACT

In this paper, we introduce a novel knowledge-based word sense disambiguation method that determines the sense of an ambiguous word in biomedical text using semantic similarity or relatedness measures. These measures quantify the degree of similarity between concepts in the Unified Medical Language System (UMLS). The objective of this work was to develop a method that can disambiguate terms in biomedical text by exploiting similarity information extracted from the UMLS and to evaluate the efficacy of information content-based semantic similarity measures, which augment path-based information with probabilities derived from biomedical corpora. We show that information content-based measures obtain a higher disambiguation accuracy than path-based measures because they weight the path based on where it exists in the taxonomy coupled with the probability of the concepts occurring in a corpus of text.


Subject(s)
Knowledge Bases , Natural Language Processing , Terminology as Topic , Unified Medical Language System
8.
J Biomed Inform ; 44(2): 251-65, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21044697

ABSTRACT

Our objective is to develop a framework for creating reference standards for functional testing of computerized measures of semantic relatedness. Currently, research on computerized approaches to semantic relatedness between biomedical concepts relies on reference standards created for specific purposes using a variety of methods for their analysis. In most cases, these reference standards are not publicly available and the published information provided in manuscripts that evaluate computerized semantic relatedness measurement approaches is not sufficient to reproduce the results. Our proposed framework is based on the experiences of medical informatics and computational linguistics communities and addresses practical and theoretical issues with creating reference standards for semantic relatedness. We demonstrate the use of the framework on a pilot set of 101 medical term pairs rated for semantic relatedness by 13 medical coding experts. While the reliability of this particular reference standard is in the "moderate" range; we show that using clustering and factor analyses offers a data-driven approach to finding systematic differences among raters and identifying groups of potential outliers. We test two ontology-based measures of relatedness and provide both the reference standard containing individual ratings and the R program used to analyze the ratings as open-source. Currently, these resources are intended to be used to reproduce and compare results of studies involving computerized measures of semantic relatedness. Our framework may be extended to the development of reference standards in other research areas in medical informatics including automatic classification, information retrieval from medical records and vocabulary/ontology development.


Subject(s)
Medical Informatics/methods , Medical Records Systems, Computerized/standards , Semantics , Clinical Coding , Databases, Factual , Reference Standards , Software
9.
AMIA Annu Symp Proc ; 2010: 572-6, 2010 Nov 13.
Article in English | MEDLINE | ID: mdl-21347043

ABSTRACT

Automated approaches to measuring semantic similarity and relatedness can provide necessary semantic context information for information retrieval applications and a number of fundamental natural language processing tasks including word sense disambiguation. Challenges for the development of these approaches include the limited availability of validated reference standards and the need for better understanding of the notions of semantic relatedness and similarity in medical vocabulary. We present results of a study in which eight medical residents were asked to judge 724 pairs of medical terms for semantic similarity and relatedness. The results of the study confirm the existence of a measurable mental representation of semantic relatedness between medical terms that is distinct from similarity and independent of the context in which the terms occur. This study produced a validated publicly available dataset for developing automated approaches to measuring semantic relatedness and similarity.


Subject(s)
Natural Language Processing , Semantics , Humans , Information Storage and Retrieval , Vocabulary
10.
AMIA Annu Symp Proc ; 2009: 431-5, 2009 Nov 14.
Article in English | MEDLINE | ID: mdl-20351894

ABSTRACT

A number of computational measures for determining semantic similarity between pairs of biomedical concepts have been developed using various standards and programming platforms. In this paper, we introduce two new open-source frameworks based on the Unified Medical Language System (UMLS). These frameworks consist of the UMLS-Similarity and UMLS-Interface packages. UMLS-Interface provides path information about UMLS concepts. UMLS-Similarity calculates the semantic similarity between UMLS concepts using several previously developed measures and can be extended to include new measures. We validate the functionality of these frameworks by reproducing the results from previous work. Our frameworks constitute a significant contribution to the field of biomedical Natural Language Processing by providing a common development and testing platform for semantic similarity measures based on the UMLS.


Subject(s)
Natural Language Processing , Semantics , Software , Unified Medical Language System , Information Storage and Retrieval , Ownership
11.
AMIA Annu Symp Proc ; : 533-7, 2007 Oct 11.
Article in English | MEDLINE | ID: mdl-18693893

ABSTRACT

This paper explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap as features for a supervised learning approach to word sense disambiguation of biomedical text. We compare the use of CUIs that occur in abstracts containing an instance of the target word with using the CUIs that occur in sentences containing an instance of the target word. We also experiment with frequency cutoffs for determining which CUIs should be included as features. We find that a Naive Bayesian classifier where the features represent CUIs that occur two or more times in abstracts containing the target word attains accuracy 9% greater than Leroy and Rindflesch's approach, which includes features based on semantic types assigned by MetaMap. Our results are comparable to those of Joshi, et. al. and Liu, et. al., who use feature sets that do not contain biomedical information.


Subject(s)
Artificial Intelligence , Subject Headings , Unified Medical Language System , Abstracting and Indexing , Algorithms , Bayes Theorem , Semantics
12.
J Biomed Inform ; 40(3): 288-99, 2007 Jun.
Article in English | MEDLINE | ID: mdl-16875881

ABSTRACT

Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information content statistics from corpora. We also derive a context vector measure based on medical corpora that can be used as a measure of semantic relatedness. These six measures are evaluated against a newly created test bed of 30 medical concept pairs scored by three physicians and nine medical coders. We find that the medical coders and physicians differ in their ratings, and that the context vector measure correlates most closely with the physicians, while the path-based measures and one of the information content measures correlates most closely with the medical coders. We conclude that there is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures.


Subject(s)
Medical Informatics/methods , Natural Language Processing , Terminology as Topic , Database Management Systems , Databases, Factual , Forms and Records Control , Humans , Information Storage and Retrieval , Language , Medical Records Systems, Computerized , Semantics , Software , Systematized Nomenclature of Medicine , Vocabulary, Controlled
13.
AMIA Annu Symp Proc ; : 399-403, 2006.
Article in English | MEDLINE | ID: mdl-17238371

ABSTRACT

Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.


Subject(s)
Abbreviations as Topic , Algorithms , Artificial Intelligence , Medical Records Systems, Computerized , Bayes Theorem , Decision Trees , Information Storage and Retrieval
14.
AMIA Annu Symp Proc ; : 589-93, 2005.
Article in English | MEDLINE | ID: mdl-16779108

ABSTRACT

Use of abbreviations and acronyms is pervasive in clinical reports despite many efforts to limit the use of ambiguous and unsanctioned abbreviations and acronyms. Due to the fact that many abbreviations and acronyms are ambiguous with respect to their sense, complete and accurate text analysis is impossible without identification of the sense that was intended for a given abbreviation or acronym. We present the results of an experiment where we used the contexts harvested from the Internet through Google API to collect contextual data for a set of 8 acronyms found in clinical notes at the Mayo Clinic. We then used the contexts to disambiguate the sense of abbreviations in a manually annotated corpus.


Subject(s)
Abbreviations as Topic , Artificial Intelligence , Medical Records , Algorithms , Internet , Natural Language Processing
SELECTION OF CITATIONS
SEARCH DETAIL