Search | VHL Regional Portal

An efficient prototype method to identify and correct misspellings in clinical text.

Workman, T Elizabeth; Shao, Yijun; Divita, Guy; Zeng-Treitler, Qing.

BMC Res Notes ; 12(1): 42, 2019 Jan 18.

Article in English | MEDLINE | ID: mdl-30658682

ABSTRACT

OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. RESULTS: In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.

Subject(s)

Dictionaries as Topic , Medical Informatics/methods , Natural Language Processing , Vocabulary, Controlled , Algorithms , Humans , Language , Medical Informatics/standards , Medical Informatics/statistics & numerical data , Medical Records Systems, Computerized/standards , Medical Records Systems, Computerized/statistics & numerical data , Pathology, Surgical/methods , Reproducibility of Results , Research Report/standards , Unified Medical Language System/standards , Unified Medical Language System/statistics & numerical data

CodeMapper: semiautomatic coding of case definitions. A contribution from the ADVANCE project.

Becker, Benedikt F H; Avillach, Paul; Romio, Silvana; van Mulligen, Erik M; Weibel, Daniel; Sturkenboom, Miriam C J M; Kors, Jan A.

Pharmacoepidemiol Drug Saf ; 26(8): 998-1005, 2017 Aug.

Article in English | MEDLINE | ID: mdl-28657162

ABSTRACT

BACKGROUND: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process. METHODS: CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study. RESULTS: Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616. CONCLUSIONS: Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.

Subject(s)

Databases, Factual/statistics & numerical data , International Classification of Diseases/statistics & numerical data , Medical Records Systems, Computerized/statistics & numerical data , Unified Medical Language System/statistics & numerical data , Europe/epidemiology , Humans

Analysis of a study of the users, uses, and future agenda of the UMLS.

Chen, Yan; Perl, Yehoshua; Geller, James; Cimino, James J.

J Am Med Inform Assoc ; 14(2): 221-31, 2007.

Article in English | MEDLINE | ID: mdl-17213497

ABSTRACT

OBJECTIVES: The UMLS constitutes the largest existing collection of medical terms. However, little has been published about the users and uses of the UMLS. This study sheds light on these issues. DESIGN: We designed a questionnaire consisting of 26 questions and distributed it to the UMLS user mailing list. Participants were assured complete confidentiality of their replies. To further encourage list members to respond, we promised to provide them with early results prior to publication. Sector analysis of the responses, according to employment organizations is used to obtain insights into some responses. RESULTS: We received 70 responses. The study confirms two intended uses of the UMLS: access to source terminologies (75%), and mapping among them (44%). However, most access is just to a few sources, led by SNOMED, MeSH, and ICD. Out of 119 reported purposes of use, terminology research (37), information retrieval (19), and terminology translation (14) lead. Four important observations are that the UMLS is widely used as a terminology (77%), even though it was not designed as one; many users (73%) want the NLM to mark concepts with multiple parents in an indented hierarchy and to derive a terminology from the UMLS (73%). Finally, auditing the UMLS is a top budget priority (35%) for users. CONCLUSIONS: The study reports many uses of the UMLS in a variety of subjects from terminology research to decision support and phenotyping. The study confirms that the UMLS is used to access its source terminologies and to map among them. Two primary concerns of the existing user base are auditing the UMLS and the design of a UMLS-based derived terminology.

Subject(s)

Unified Medical Language System/statistics & numerical data , Vocabulary, Controlled , Consumer Behavior , Data Collection , Decision Support Techniques , Humans , Information Storage and Retrieval , Research , Surveys and Questionnaires , Unified Medical Language System/economics , Unified Medical Language System/trends

Who is using the UMLS and how - insights from the UMLS user annual reports.

Fung, Kin Wah; Hole, William T; Srinivasan, Suresh.

AMIA Annu Symp Proc ; : 274-8, 2006.

Article in English | MEDLINE | ID: mdl-17238346

ABSTRACT

The NLM's UMLS resources are available to users free of charge under a license that requires submission of an annual report on their usage. A new web-based template was used to collect users' annual reports for the calendar year 2004. Out of 2,677 li-censees, 1,427 (53%) submitted their annual reports through the web template. This represented a five-fold increase in the reports submitted compared to previous years. The information collected via the web template was more structured, more complete and easier to analyze. The main results from the 2004 annual reports are summarized and discussed. They are being used to guide UMLS developments.

Subject(s)

Unified Medical Language System/statistics & numerical data , Data Collection/methods , Internet , Surveys and Questionnaires , Unified Medical Language System/trends , Vocabulary, Controlled

A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies.

Berman, Jules J.

BMC Med Inform Decis Mak ; 3: 6, 2003 Jun 16.

Article in English | MEDLINE | ID: mdl-12809560

ABSTRACT

BACKGROUND: Large biomedical data sets have become increasingly important resources for medical researchers. Modern biomedical data sets are annotated with standard terms to describe the data and to support data linking between databases. The largest curated listing of biomedical terms is the the National Library of Medicine's Unified Medical Language System (UMLS). The UMLS contains more than 2 million biomedical terms collected from nearly 100 medical vocabularies. Many of the vocabularies contained in the UMLS carry restrictions on their use, making it impossible to share or distribute UMLS-annotated research data. However, a subset of the UMLS vocabularies, designated Category 0 by UMLS, can be used to annotate and share data sets without violating the UMLS License Agreement. METHODS: The UMLS Category 0 vocabularies can be extracted from the parent UMLS metathesaurus using a Perl script supplied with this article. There are 43 Category 0 vocabularies that can be used freely for research purposes without violating the UMLS License Agreement. Among the Category 0 vocabularies are: MESH (Medical Subject Headings), NCBI (National Center for Bioinformatics) Taxonomy and ICD-9-CM (International Classification of Diseases-9-Clinical Modifiers). RESULTS: The extraction file containing all Category 0 terms and concepts is 72,581,138 bytes in length and contains 1,029,161 terms. The UMLS Metathesaurus MRCON file (January, 2003) is 151,048,493 bytes in length and contains 2,146,899 terms. Therefore the Category 0 vocabularies, in aggregate, are about half the size of the UMLS metathesaurus.A large publicly available listing of 567,921 different medical phrases were automatically coded using the full UMLS metatathesaurus and the Category 0 vocabularies. There were 545,321 phrases with one or more matches against UMLS terms while 468,785 phrases had one or more matches against the Category 0 terms. This indicates that when the two vocabularies are evaluated by their fitness to find at least one term for a medical phrase, the Category 0 vocabularies performed 86% as well as the complete UMLS metathesaurus. CONCLUSION: The Category 0 vocabularies of UMLS constitute a large nomenclature that can be used by biomedical researchers to annotate biomedical data. These annotated data sets can be distributed for research purposes without violating the UMLS License Agreement. These vocabularies may be of particular importance for sharing heterogeneous data from diverse biomedical data sets. The software tools to extract the Category 0 vocabularies are freely available Perl scripts entered into the public domain and distributed with this article.

Subject(s)

Decision Support Techniques , Medical Informatics/methods , Software/trends , Unified Medical Language System/trends , Algorithms , Databases, Bibliographic/trends , Databases, Factual/trends , Humans , Medical Informatics/statistics & numerical data , Medical Informatics/trends , Research Design/statistics & numerical data , Research Design/trends , Software/statistics & numerical data , Unified Medical Language System/statistics & numerical data

Characteristics of consumer terminology for health information retrieval.

Zeng, Q; Kogan, S; Ash, N; Greenes, R A; Boxwala, A A.

Methods Inf Med ; 41(4): 289-98, 2002.

Article in English | MEDLINE | ID: mdl-12425240

ABSTRACT

OBJECTIVES: As millions of consumers perform health information retrieval online, the mismatch between their terminology and the terminologies of the information sources could become a major barrier to successful retrievals. To address this problem, we studied the characteristics of consumer terminology for health information retrieval. METHODS: Our study focused on consumer queries that were used on a consumer health service Web site and a consumer health information Web site. We analyzed data from the site-usage logs and conducted interviews with patients. RESULTS: Our findings show that consumers' information retrieval performance is very poor. There are significant mismatches at all levels (lexical, semantic and mental models) between the consumer terminology and both the information source terminology and standard medical vocabularies. CONCLUSIONS: Comprehensive terminology support on all levels is needed for consumer health information retrieval.

Subject(s)

Consumer Behavior , Information Storage and Retrieval/standards , Terminology as Topic , Unified Medical Language System/statistics & numerical data , Adult , Consumer Behavior/statistics & numerical data , Female , Humans , Internet , Male , Middle Aged

Assessing and enhancing the value of the UMLS Knowledge Sources.

Humphreys, B L; Lindberg, D A; Hole, W T.

Proc Annu Symp Comput Appl Med Care ; : 78-82, 1991.

Article in English | MEDLINE | ID: mdl-1807711

ABSTRACT

The goal of the UMLS Project is to give practitioners and researchers easy access to machine-readable information from diverse sources. Assessment of the first experimental versions of the UMLS Knowledge Sources is essential to measuring progress toward that goal and to identifying needed enhancements. As of July 30, 1991, copies of the first edition of the UMLS Knowledge Sources had been distributed to 143 individuals and institutions; 66 had provided initial feedback information. The information received indicates that the UMLS Knowledge Sources will undergo broad testing in the patient care, medical education, library service, and product development environments. Preliminary data support the hypothesis that expanded coverage of routine clinical concepts is needed. Key enhancements planned for 1992 and beyond include expanded coverage of ICD-9-CM and CPT.

Subject(s)

Unified Medical Language System/organization & administration , National Library of Medicine (U.S.) , Program Evaluation , Surveys and Questionnaires , Unified Medical Language System/statistics & numerical data , United States , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL